Cpu speeds compared: Intel Core i5-1235U vs Intel Core i5-1135G7 @ 2.40GHz [cpubenchmark.net] by PassMark Software

Insydium X-Particles Benchmark Results & CPU Performance compared

TABLE OF CONTENTS

X-Particles is becoming more popular by the day and we are seeing a lot of requests from readers looking to build a PC for X-Particle Workloads.

Insydium’s Plugin is a particle and VFX system for Cinema 4D that comes with a treasure trove of features. ParticleFX, Smoke & Fire Simulation, Fluid-Simulations, Dynamics, Cloth Simulation are just part of what X-Particles has to offer.

Before we can make hardware recommendations for X-Particles, we need to take a look at how X-Particles uses a workstation’s hardware.

How does X-Particles utilize your PC’s components?

Apart from Cinema 4D’s Viewport Performance which is responsible for displaying a preview of your particle setup, the main performance demand comes from caching your particle system.

X-Particles utilizes the CPU to cache each frame and stores the cache either in your RAM or on your permanent storage. Although caches can get quite big, it’s straightforward to just get a larger SSD to make room for them.

What’s not all that easy, though, is decreasing the time it takes to build an X-Particles cache.

The reason is as simple as it is complex:

X-Particle comes with many different features and solvers, and each of these makes use of your CPU in different ways. Some parts might be easily optimized to make use of high-core count CPUs, while others see no benefit at all.

On the contrary, as we’ll see in the benchmarks below, CPUs that are great for a specific X-Particle Feature (e.g. Fluid Sim) might be particularly bad for another feature (e.g. Granular Advection).

X-Particle Benchmarks

For our Benchmarking runs we used 5 Scenes from the Content Repository, opened them in Cinema 4D R21, and ran a cache build without changing anything else.

X-Particle Version Build 895 Trial.

Here are the results:

Trail Display Flowfield

CPU	# of Cores	Base Clock	Cache Build Time (seconds)
AMD Ryzen 9 5900X	12	3. 4	18
Intel i9 11900K	8	3.5	22
Intel i5 11600K	6	3.9	25
AMD Ryzen 3900X	12	3.8	25
AMD Threadripper 3990X	64	2.9	29
AMD Threadripper 2950X	16	3.5	39
AMD Threadripper 1950X	16	3.4	40
Apple Mac Studio 2022 – M1 Max	10	3.2	41

The Trail Display Flowfield Scene is an allrounder Scene as it makes use of many different X-Particles Features. There’s a Smoke & Fire Sim present alongside a Fluid Simulation and Particle Trails.

Looking at the scores, there’s no question that this Scene heavily favors CPUs with high Core Clocks. Lower clocking CPUs such as the Threadripper 3990X rank at the bottom, even though they have a decent number of cores.

The scene does have some Fluid Simulation elements that make great use of high-core count CPUs, but in the end it’s the weakest link that counts. And the weakest link are X-Particles Features that can’t be parallelized – making the 64-Core Threadripper 3990X a bad performer in this scene.

ExplosiaFX Particle Curl

CPU	# of Cores	Base Clock	Cache Build Time (seconds)
Intel i9 11900K	8	3.5	173
Apple Mac Studio 2022 – M1 Max	10	3.2	183
AMD Ryzen 9 5900X	12	3.4	195
Intel i5 11600K	6	3.9	196
AMD Ryzen 3900X	12	3.8	240
AMD Threadripper 3990X	64	2.9	248
AMD Threadripper 2950X	16	3.5	271
AMD Threadripper 1950X	16	3.4	289

Both Scenes, the ExplosiaFX Particle Curl and the ExplosiaFX Granular Advection below are highly dependent on single-core performance. If these types of particle setups are your bread and butter, look for mainstream CPUs that clock high and have a moderate amount of cores.

ExplosiaFX Granular Advection

CPU	# of Cores	Base Clock	Cache Build Time (seconds)
AMD Ryzen 9 5900X	12	3.4	53
Intel i9 11900K	8	3.5	53
Intel i5 11600K	6	3.9	55
AMD Ryzen 3900X	12	3.8	67
AMD Threadripper 3990X	64	2.9	73
Apple Mac Studio 2022 – M1 Max	10	3.2	76
AMD Threadripper 2950X	16	3.5	79
AMD Threadripper 1950X	16	3.4	85

Fluidflip APIC Splash Tank

CPU	# of Cores	Base Clock	Cache Build Time (seconds)
AMD Threadripper 3990X	64	2. 9	129
AMD Ryzen 9 5900X	12	3.4	198
AMD Ryzen 3900X	12	3.8	265
AMD Threadripper 2950X	16	3.5	318
Intel i9 11900K	8	3.5	319
AMD Threadripper 1950X	16	3.4	345
Intel i5 11600K	6	3.9	380
Apple Mac Studio 2022 – M1 Max	10	3.2	440

The Fluid Simulation Benchmark is the easiest to read from the 5 Scenes we tested. The 64-Core Count Threadripper 3990X easily pulls ahead as Fluid Simulation seems to be well optimized for parallel computing.

Interestingly the 16-Core Threadripper 2950X is slower than the 12-Core 5900X, but given that the 5900X is two generations newer, it makes sense. The generational leap and higher clocks let the 5900X outperform the 2950X in X-Particles Fluid Sim.

Dynamics Floating Objects

CPU	# of Cores	Base Clock	Cache Build Time (seconds)
AMD Ryzen 9 5900X	12	3.4	23
AMD Ryzen 3900X	12	3.8	24
AMD Threadripper 3990X	64	2.9	25
Intel i9 11900K	8	3.5	27
Intel i5 11600K	6	3.9	29
AMD Threadripper 2950X	16	3.5	41
AMD Threadripper 1950X	16	3.4	46
Apple Mac Studio 2022 – M1 Max	10	3.2	66

The Dynamics Floating Objects Scene is somewhat of a mixed bag. It heavily favors CPUs with high single-core performance but does make some use of lots of cores. The Threadripper 3990X, albeit sporting the lowest core clocks, is able to rank in second place thanks to its many cores that tip the scale in its favor.

What’s the best CPU for X-Particles?

If you want to make use of the full range of X-Particles Features, you’ll need a CPU that has both as many cores as possible and clocks as high as possible.

In other words, you need a CPU that doesn’t exist.

The more cores a CPU has, the lower it usually clocks. That’s just how thermal and power limits work. Right now, great allrounder CPUs for X-Particles are the AMD Ryzen 9 5900X and 5950X with 12 to 16 Cores that have high single-core performance.

On the Intel side of things, the 10900K and 11900K CPUs are excellent for X-Particles, although both have “only” 8-10 Cores.

As can be deducted from the Benchmarks above, most of X-Particles Features benefit more from high clocks than from lots of cores. So unless you know you’ll use X-Particles only to create fluid sims, you should stick to high-clocking mainstream CPUs with a moderate Core-Count.

Fluid Simulations, though, are quite scalable and make decent use of the many cores in a CPU such as the AMD Threadripper 3960X.

The AMD Threadripper 3990X that we tested, though, can hardly make use of its full 64 Cores and is wasted on X-Particles.

Here’s a Screenshot of the 3990X’s CPU Utilization while building the Fluid Sim Cache (Fluidflip APIC Splash Tank Scene):

Only 40% CPU Utilization give or take when building Fluid Sim caches in X-Particles on a 64 Core-Count CPU.

The above Scores and CPU Recommendations fall in line with our PC-Recommendations for Cinema 4D and 3D Modeling:

Get a mainstream high-clocking CPU (5900X, 5950X, 11900k, 10900k) for most cases
Get a high-end Threadripper with lots of Cores (3960X, 3970X) if you make use of the few features that scale nicely (CPU Rendering, Fluid Sim)

Here’s what our resident CPU expert says about these recommendations:

Although using Clock Speeds is an easy way to compare the performance of CPUs of a similar generation, it’s only an approximation. IPC (Instructions per cycle), and other hardware-level features are what counts and you’ll find that some CPUs are capable of easily outperforming the competition or older generations even though their clock speeds are lower. Always consult benchmarks before making a buying decision.

Over to you

You can download the 5 Scenes we used here and help us fill in the gaps. Let us know the following in the comments:

Scene
CPU (@Stock clocks or OC?)
Cache Build Time
Cinema 4D Version
X-Particles Version

Thanks! 🙂

CGDirector is Reader-supported. When you buy through our links, we may earn an affiliate commission.

CPU Performance Tested

Our benchmark tests show you which CPUs offer the best audio performance, whether you’re on a budget or money’s no object.

Every few years, as resource-hungry software takes advantage of newer, more powerful computers, the same questions arise about the computer that lies at the heart of your studio: do I need to upgrade this machine, or should I replace it entirely? And what kind of improvement can I get for my money?

In this article, I’ll help you to answer those questions by taking you through the results of some CPU performance tests I carried out recently. I included a range of chips from the more cost-conscious to the fastest chips money can buy, and have looked at both mobile and desktop versions, as well as new chips and some that are now three or four years old. The chart makes clear which CPUs offer the best performance, but nothing about cost and certain other important considerations. So I’ll also explain in these pages a bit more about what makes a good CPU, and the implications of your CPU choice for the rest of the system (in terms of heat, cooling and noise, for example).

Download the full resolution chart here

The CPU (Central Processing Unit) is just one factor that influences your computer’s audio performance. There are others, not covered here, including the type of storage used and the amount and speed of RAM, but the CPU plays a hugely important part.

The days of being able to identify the best CPU by looking at the headline clock speed are long gone. Despite the clock speeds not having increased dramatically in recent years, the processor manufacturers have continued to improve their chips in other ways. (See the Intel & AMD box.) This means that when comparing different generations of similarly named chips, the clock speed and core count may appear similar, but the real-world audio performance will vary significantly. When moving on from a product that’s three, four or perhaps even more years old to a new setup, it’s therefore very difficult to figure out what sort of performance improvement your money will buy — and it’s particularly difficult to establish at what point spending more money takes you past the point of diminishing returns.

I spent quite some time comparing a wide range of CPUs commonly used in audio machines, using a standard test, the freely available DAWBench DSP Universal 2014 test. This reveals a lot about a CPU’s suitability for audio production: although it won’t give the full picture in regards to ASIO performance, which can vary according to other system factors, this test is designed specifically to ‘stress test’ a computer’s CPU. DAWBench has been discussed many times in SOS and there’s plenty of information at www. dawbench.com but, essentially, it loads instances of a plug-in in a DAW, measures the performance and generates a score. By restricting the tests to a standard plug-in (Cockos’s ReaXComp multiband compressor, in this case), DAWBench enables you to make meaningful comparisons in performance.

It’s important to keep other factors as consistent as possible, so I used the same USB audio interface (Native Instruments Komplete Audio 6) for all tests. Some expensive, better-performing interfaces may result in better performances overall, but the important thing is to establish a stable baseline. (It’s also worth mentioning that, while better interfaces are available, I have established via group testing that this particular model offers a great performance/price ratio for new users wanting to make music.) For the same reason, all tests were performed in an identical Windows 7 OS installation, and the same DAW software (Reaper) was used: we can therefore be sure that the differences in performance shown on the chart aren’t due to any differences in software.

I could have tested just the current generation of CPUs, but chose also to include legacy models because most users will keep a dedicated audio computer running for three to five years before upgrading — I hope the chart will give some indication of how the performance has progressed over the last several years, and how newer but lower-spec CPUs compare with older top-of-the-line ones.

As we’ll see, another important ‘take away’ from this test is that not all CPUs with the same nomenclature are created equal, even if they’re of the same generation and have the same naming system — this is particularly important to bear in mind when moving from a mobile to a desktop setup, or vice versa.

Finally, these tests are inevitably a snapshot: it won’t be too many months before a new bigger, badder, faster model comes along, so perhaps we’ll revisit this question when the next generation of chips becomes available. With all that in mind, let’s run through the results, from the bottom of the chart upwards. ..

First up, we have the Intel Core i3 4010U. It’s not surprising to see the U-series chips at the bottom of the chart: these chips were initially conceived to power the ‘ultrabook’ class of super-thin, lightweight laptops, were designed to offer longer battery life, and are closer in performance terms to the old Atom range than the desktop CPUs that share their ‘Core’ name. Nonetheless, they’re also found in many regular low- and mid-range full-size laptops as well as high-end tablets, including the Microsoft Surface Pro 3. For simple multitrack recording or editing of audio you‘ve recorded on location they offer all you need in a portable and inexpensive machine. If you wish to use virtual instruments, effects chains or other processing, though, their performance will soon prove frustrating, and you’d be better off considering one of the more fully featured Intel models discussed later.

The next (small) step up is another popular mobile chip, the basic M-series. This includes the dual-core models found in the current entry-level MacBooks. They outperform the U-class chips, but by a small margin: the performance levels are around 25 percent greater and so they’re not much better for audio work. I’d suggest that anyone wanting an Apple laptop for anything approaching serious studio work should find the funds for a MacBook Pro — a refurbished or secondhand model from a generation or two ago will probably outperform the current standard-issue MacBooks, and will also offer greater connectivity.

I’d hoped to test chips as far back as the Intel Core2 Quad Q6600. This offered the best bang-per-buck of its generation and was regarded as the CPU of choice for studio use. Unfortunately, the OS build I used refused point blank to work with this chipset, and to keep the tests consistent this model was excluded. Still, some useful data can be found on the CPUBenchmark web site (see box), which suggests that it offers roughly 40 percent lower performance than the Core i7 930, and only 20 percent higher than our bottom-of-chart CPU.

The Intel Core i7 930 and its family of processors ran on the X58-type motherboard, which saw a number of changes and significant improvements to the chipset. This is the generation of advancement that saw Intel move the memory controller onto the CPU die itself, as well as introduce the QPI (quick path interconnect) board design that emulated AMD’s HyperTransport bus design. By shortening the distance that the data had to travel before being processed, this yielded some significant performance gains — this was the first generation that saw Intel stride ahead of AMD in the performance race.

The next cluster of entries make for interesting comparison. The Core i7 4710MQ is possibly the most popular current option for production laptops, with good reason. It offers the best price-to-performance ratio in Intel’s mobile CPU range: there’s a handful of better-performing models but they roughly double in price for each five-to-eight percent increase in performance. Similar performance levels are offered by the AMD FX9370 and FX9590 (AMD’s current flagship models) and Intel’s Core i7 2600K, their consumer-level flagship in 2012.

As you can see, even today the i7 2600K is no slouch. Similarly, the Core i7 4710MQ currently powers many a laptop for on-the-road producers and performers. For those wanting to use an audio machine on-the-hoof, these processors are not bad by any means and should still hold appeal, but if you’re thinking of a long-term investment in a studio machine, note how startlingly different the benchmark score is between these and the highest-performing desktop chips!

The AMD chips cost less, as do related components, because the motherboards go through fewer socket changes over the generations, which means a computer can be upgraded over a longer period of time. They’re therefore a tempting proposition if you’re putting together your first dedicated setup on a budget. Their downside is that they’re rather power hungry, largely because AMD have found themselves squeezing hard to get every last drop of performance out of a platform that’s overdue a redesign. For example, the FX9590 requires almost twice as much power than the current crop of high-end Intel chips. You don’t have to be an eco-warrior for this to be relevant: more power means more heat, which means more effective cooling and, ultimately, a noisier system. If you can isolate your computer and ensure that any generated noise levels are taken care of, the AMD models are well worth consideration — but the limited budget that makes them attractive probably also means a smaller space in which to record and mix, in which case the extra noise will be a most unwelcome guest!

The Core i7 3770K is another consumer-level flagship chip from a couple of generations ago. Above this we have the Core i3 4370. You may be surprised to see an i3 chip included in the tests, as these are positioned as Intel’s budget-friendly low-end, low-cost CPUs and compete with AMD in the home-office market. Its place here is warranted, though, as the first of the current generation: it shows us what to expect of the i3 range when lined up against other currently available products in the range. Although perfect for the sort of role for which they’re intended, they tend to lack raw processing power when it comes to much more than basic multitracking and editing duties in the studio. Also, AMD offer better-performing options in the same price bracket.

The Core i5 4960 is a capable chip and a good entry point for those who are working on a budget and building a basic studio computer. Again, at this price the AMD products enjoy a better performance-to-price ratio but at the cost of more heat/noise; the i5 4960’s power draw is around a third of the high-end AMD chips’ in the same price bracket.

Moving up the chart, we come to one of the most popular current options for the home studio, the Core i7 4790. I’ve included both its low-power ‘S’ and full-performance ‘K’ series revisions. The ‘S’ version of the chip has a total TDP (Thermal Design Power, the maximum heat generated by the CPU that the cooling system is typically required to dissipate) of 65W. By contrast, the TDP of AMD’s flagship chip is 220W! This shows just how efficient Intel have made their CPUs over the last few generations. Another model, below the ‘K’ and ‘S’ versions in performance terms, is the ‘T’ edition. This has an even lower TDP of 45W and is particularly well suited to smaller form–factor computers and in builds where passive cooling is desired (for instance, to create a truly silent PC.)

The ‘K’ edition is the highest-clocked version; it gains another 30 percent performance and still manages a respectable 88W TDP — it’s easy to build a low-noise machine around one of these. This, along with the competitive price, gives it amongst the best price-to-performance ratio and it’s probably the most popular current choice for those wishing to do everything in the box in a home-studio. It also means it’s a reasonable upgrade for users of the i7 930 (or previous) generation of CPU — although looking at the scores of the i7 2600K and above, it may not prove such good value as an upgrade path in the longer term.

Another key difference between the ‘K’ chip and the standard one lies in how they ‘Turbo’ the CPU. The Turbo feature on the Intel CPUs allows them to run above the standard clock speed automatically when the system determines that there’s the overhead to do so without overheating or otherwise causing the CPU to become unstable. The Core i5 4690, for instance, is advertised with a 3.5GHz base clock speed and a 3.9GHz Turbo clock speed, and these quoted figures are the same for both the standard and K-series chips. The reason for a £10$15 difference in price is a different implementation of the Turbo feature. The standard editions stagger the overclock across the four physical cores: in this instance, Turbo will overclock core 1 to 3.9GHz, core 2 to 3.8GHz, core 3 to 3.7GHz and Core 4 to only 3.6GHz. The ‘K’ version, on the other hand, will overclock all cores to the advertised limit, so all the cores can work at 3.9GHz resorting to traditional overclocking.

If we then compare the i7 4790K with its standard-edition counterpart, we see a 3.6GHz base clock with a 4GHz Turbo, versus a 4GHz base clock and 4.4GHz Turbo — the ‘K’ version represents much better value for audio users given the small increase in price. And that’s before we even begin to consider the benefits of a gentle overclock to 4.6GHz or above, which this chip is easily capable of.

Our next group of chips includes Intel’s so-called ‘Enthusiast’ CPUs and a couple of entries from its Xeon server class. The Core i7 4960X was the flagship chip of the previous generation and is more powerful than the current consumer-class CPUs. Being a six-core design (the mainstream ones all have four cores) this didn’t come as a surprise. The same generation’s Core i7 5820K has a similar specification and offers a small increase in performance but is unlikely to prove worthwhile to owners of the previous generation’s six-core chips.

For sheer performance, the latest generation of Intel Core i7 chips cannot be beaten.Intel’s current flagship desktop CPU is the eight-core Core i7 5960X. At its regular clock speed it’s more than a solid performer but, like previous flagship chips, it doesn’t offer great value for money compared with other processors in its class. It performs very well when overclocked, though, and it is, without a shadow of doubt, the best pick if CPU performance is the only criterion you’re interested in.

Along with the extra cores and overall performance, this chipset can support twice as many RAM DIMMs as previous ones, making it a perfect choice for those creating large, detailed arrangements using resource-intensive sample-based instruments such as Vienna Symphonic Library. In fact, it is users of such libraries (along with those who also do video work) who are usually pushing the limits of what systems can do.

This brings us nicely back to the Xeon server-class chips, which seem, at first glance, to hold so much promise. In principle, the differences between Xeon chips and their i7-series siblings are actually few and far between, the chief one being that Xeons benefit from an additional QPI link — something which enables data sharing between the two or more processors found in Xeon chips. Another is the sheer number of processors: there are already Xeon chips available with as many as 18 cores, although these are seriously expensive! The downside of the QPI link is that the chips remain ‘locked down’ — in other words, overclocking isn’t possible and there’s extremely little scope to tweak for performance enhancement. The reason for this is that these CPUs are designed for server applications, in which it is of critical importance that data sharing between the cores remains stable. The Xeon chips with a high core count are also optimised for parallelisation.

Unfortunately, in tests of Xeon CPUs, my colleagues and I have tended to find that the higher the core count, the slower each individual core becomes. With load balancing being handled by both the OS and your sequencer, each trying to squeeze the most out of the available performance, some headroom is inevitably lost. In fact, fewer cores with higher clock speeds will almost always yield better performance overall than a greater number of cores running more slowly. This goes some way to explaining the strong showing by the higher speed-per-core overclocked i7 5960X over the slower-clocked but higher core-count Xeons in the test results.

But it’s actually slightly more complex than that, because I’ve tested Xeon CPUs with a number of sequencers, and it’s clear that the Xeon architecture has yet to be fully exploited by any DAW manufacturer. Multi-core support varies from one DAW to another, and from one version of a DAW to another: some support a maximum of 16 cores, while a few can handle up to 40. This upper limit includes so-called ‘hyper-threaded’ cores, which mean in Intel’s case that there are twice as many ‘virtual cores’ as there are physical processors.

The Xeon server-class chips offer as many as 36 virtual cores per CPU, but DAW software has yet to be designed to take advantage. For now, they remain an expensive luxury, probably only of benefit to composers who have reached the RAM ceiling of the current Core i7 systems — but that itself is something that looks set to change!By way of example, a hyper-threaded 18-core Xeon CPU makes 36 virtual cores available to your OS, and installing two such CPUs would give you a whopping 72 cores. Alas, as even the most multicore-friendly DAW can access only the first 40 cores, the other 32 would simply be released for other purposes, or remain idle. On the chart we can see this theory played out in the real world: the Xeon E5 2620 v3, a six-core 2. 4GHz (3.2GHz Turbo) model, is comparable with the Xeon E5 2670 v3, a 12-core 2.3GHz (3.1GHz Turbo) CPU. When testing each of these, I used a pair of chips, so the E5 2670 v3s gave us a total of 48 threads (12 per chip makes 24, and hyperthreading doubles this to 48), and I witnessed the 40-thread wall in practice. While this pair of chips is capable of better performance than the chart shows, it requires a software developer to take advantage of it before you can reap the rewards, in practice.

Of course, having eight spare threads for Windows and other processes to use allows the DAW unimpeded access to the processing cores that are assigned to it. Yet this system sits pretty close to the top-of-consumer-range stock-clocked single-CPU eight-core offering… and the dual-Xeon solution costs roughly three times the 5960X! In fact, you could quite easily build a complete system around a Core i7 5960X for less than the cost of two of the Xeon CPUs alone, without any other system components. It’s safe to say, then, that while some hugely impressive Xeon CPUs are available, most users in a music-studio environment would get superior performance for less money with one of the high-end single-CPU setups. And if that’s really not enough for you, you’d probably be better off adding another machine and using software like Vienna Ensemble Pro to share the processing load across the two computers.

Until very recently, the Xeon platform offered one significant advantage over the X99/Core i7 systems: it could support a far larger amount of memory than any other chipset. Windows 8 Pro 64-bit can handle up to 512GB of RAM in theory, but the limit is lower in practice simply due to the motherboard’s ability to host that much RAM in the formats currently available. There are Xeon motherboards that can handle up to 192GB of RAM, whereas even the best ‘enthusiast’ motherboards could at the time of testing only accommodate a maximum of 64GB. Just as we were going to press, though, Corsair announced the availability of a (admittedly very expensive) 128GB RAM package. The only users really likely to require this much RAM are those who are scoring for film using the huge libraries made by firms like Vienna Symphonic Library, where some users’ templates can break the 64GB RAM limit of most consumer-level hardware. The Xeon solutions can, for those who require the very best performance and are willing to pay, offer a wonderful solution, but it looks very much like the Core i7 systems will soon be capable of similar performance.

So, after all this, what processor should you have in your next studio PC? If your current machine was a mid-range model three or four years ago or a high-end one five years ago, or if you’re looking to move up from a mobile setup to a dedicated desktop machine, then most of the current desktop offerings from Intel will offer a very noticeable improvement. Looking at the upper mid-range Intel solutions will, without a doubt, give you a solid increase in performance. Anyone working below the six- or eight-core Intel ‘enthusiast’ models will currently also see a great performance increase by moving up to a system based around these chips. On the other hand, owners of the previous-generation six-core i7s or older Xeon systems may find it hard to see value in an upgrade below the current eight-core i7s — or the high-end Xeon products, if memory expansion is right up at the top of your list of requirements.

For many years Intel and AMD were locked in a CPU arms race: newer, more powerful designs were brought to market rapidly, and every five or six years a complete architecture redesign would propel one company ahead. Recent years have seen a less aggressive release schedule, but there’s still a constant stream of improvements and refinements that can benefit studio users.

Intel’s last generational shift brought the first fully redesigned and optimised i-series CPUs (Core i5, i7 and so on). This family of chips has been with us since the turn of the decade, but while the headline figures show only modest improvements, the chips are becoming ever more efficient and thus offer better performance. Intel’s so-called ‘Tick-Tock’ release and update schedule benefits studio users. The ‘Tick’ stage refers to the CPU range receiving a ‘die shrink’, which allows for increases in performance and energy efficiency through design refinement. The ‘Tock’ brings a new micro-architecture, with the introduction of new features and functions. The full cycle completes once every 12-18 months and means that we see significant performance gains only every few years, but lower power-draw editions in between, which give us quieter cooling.

AMD’s base design predates Intel’s by a few years, and they don’t currently offer significant competition in the high-performance market. However, they’ve carved out niches with their well-established hardware and continue to hold market share, mostly through reasonably performing chips that compete strongly on price. Much of their recent sales growth is due to their provision of chips for games consoles, but they’ve sensibly been using these contracts to fund development of their next generation of hardware, and making a lot of noise about the planned release of new platforms and chipsets in 2016. Can they re-establish themselves as a provider of high-performance chips? It promises to be a very interesting year!

CPUBenchmark is a web site that hosts a database of CPU test results. Results are updated daily, which means that even without specific DAW benchmarking it should still be possible for you to get some idea of the degree of performance improvement a proposed upgrade could deliver — even for CPUs not featured in my own tests.

www.cpubenchmark.net

Buy PDF version

Speed comparison of dedicated server processors / Habr

amarao
Nov 2, 2010 at 2:33 pm

Time to read
2 min

Views 13K

Selectel Blog

During the discussion of the announcement of leased servers based on atoms for 1500r, the question arose of how much the atom is slower than other platforms.

The question was interesting. Here are the test results. We used sysbench as a test (it is available in many Linux distributions, including Ubuntu and Squeeze).

The tests distinguish between two cases — single-threaded and multi-threaded load. A typical load on a visited web server is multi-threaded. A typical workload for a single single-threaded application (eg gzip on big data) is single-threaded.

It is clearly seen that atoms are significantly behind Core/Xeon processors, which are one and a half to two times faster in one thread than an atom with two cores and hyperthreading.

Another interesting observation — on the atom, 32-bit and 64-bit modes behave the same, on all other processors, the 64-bit architecture noticeably outperforms the 32-bit one.

Nice -20 sysbench call —test=cpu —cpu-max-prime=40000 —num-threads=X run (X is the number of threads, from 1 to 16) was used as a benchmark.

Model	Atom D510 1.66 GHz	Core2Duo E8400 3. 0GHz	Core2Quad Q8300 2.5GHz	2 x Xeon 5504 2.0 GHz	2 x Xeon 5620
cores	2 + HT	2	4	8	8 + HT
Single-threaded calculation (32/64)	412.83/412.82	61.89/54.98	73.47/66.03	90.31/83.07	75.28/76.43
Multi-threaded calculation (32/64)	124.55/124.57	30.91/27.49	18.43/16.56	11.29/10.45	8.91/7.21

This data is on the graph. Pay attention, the semi-logarithmic scale (in the linear scale, the difference between atoms and the rest was so great that it made no sense to compare other processors with each other). It is also clear that CoreDuo is the fastest in single-threaded execution (which is quite understandable, because this is a desktop processor, sharpened for resource-intensive single-threaded applications, like games). Xeons, on the other hand, show their speed with maximum parallelization of tasks performed (typical server load).

Of particular interest is the increase that gives HT (there are two models with HT — Atom D150 and Xeon 5620):

Processor	Atom D510 1.66 GHz	2 x Intel Xeon 5620
without HT (multi-stream)	206.47	11.35
with HT (multi-stream)	124.57	7.21
Speed increase (%)	65.7%	57.4%

This comparison is for processors only and does not fully characterize platforms. Atoms come with 2 GB of memory, crusts — 6 GB, zeons — 12 and 24 GB, respectively; also the discs are different.

P.S. Since the logarithmic scale has raised a lot of questions, here is a different representation of the same data: processor performance relative to the best (i. e. 100%). The scale is linear. 9

performance comparison

Total votes 41: ↑33 and ↓8 +25

Comments
69

In contact with

Georgy Shuklin
@amarao

Banned for mentioning the war. No more on the hub.

Website

One of the methods for choosing a processor for a computer system

Sections NEWS ARTICLES VIDEOS

Branches

Aviation
Motor transport
Mining
City infrastructure
railway
Corporate sector and trade
Space
mechanical engineering
Medicine
Metallurgy
Microelectronics
Navigation and communication
Science and engineering
Oil and gas
OPK
Robotics
Agriculture and food industry
Shipbuilding
Chemical industry
Energy

Technical equipment

Uninterrupted Power
Biometrics and ACS
Embedded Systems
Sensors and converters
Measuring systems
Internet of Things
Artificial intelligence
Power supplies
Terminals, cable, mounting structures
Mobile devices
PLC
Software
Industrial and commercial computers
Servers and data centers
Networks and telecommunications
Functional safety SIL
HMI

Manufacturers

3onedata
AAEON
ACME
ADDI DATA
ADLink
Advantech
AdvantiX
Aetina
Akiwa
AMC
AMP
Apacer
APC
APLEX
AUO
Austriamicrosystems
Avocent
axiomtek
BD SENSORS
bdcom
Beckhoff
Belden
Beneq
BioSmart
Bopla
Connect Technology
Control Techniques
coreavi
CRANE
Cyber Power
dataforth
Degson Electronics
Delta Electronics
Diamond Systems
Duagon
EA Electro-Automatik
EKF
Emdoor
Emerson
Ensmas
etherWAN
Eurotech
Evoc
fastwel
FLIR
GaGe
GCAN
GDS
GE DE
GeoVision
Getac
Gett
GMI
goodview electronics
Harting
HBT
Hilscher
HiRel
Hirschmann
HollySys
i-sft
IBASE
Iconics
IEE
iEi
IKEY
Indukey
InnoDisk
International Rectifier
Keller AG
key technology
Kyland
Libelium
Lippert
Litemax
Lumineq
MAIWE
Maple Systems
MasterSCADA
Men Mikro
Mitsubishi Electric
MOXA
NSI
Octagon Systems
Omron
Oring
Panasonic
Pepperl & Fuchs
Pepperl & Fuchs Elcon
Perfectron
Printec DS
Prosoft Biometrics
ProVS
QNAP
QNX
Raystar
Revisor Lab
Rittal
RST
Sandisk
Scaime
Schaefer
Schneider Electric
Schroff
Sharp
Siemens
Signatec
Sigur
Simotics
Smartek Vision
Spectrum
Swissbit
SYSGO
TDK-Lambda
Thermokon
Tiepie
Tri-M Engineering
TTTech
VIPA
Visiosens
Vivotek
WAGO
WAGO I/O
Wecon
WECON
Weidmuller
Weintek
wind river
Wuhan Maiwe
XP power
XP-EMCO
Yaskawa D. M.C.
BD SENSORS RUS
CONSTEL
Lenpromavtomatika
MPS Soft
PassatInnovation
PLCSystems
PROSOFT
Prosoft Systems
Energon

To the engineer’s notebook

When designing computing systems with proprietary software, a number of difficulties arise. In particular, it is desirable to preliminarily evaluate the performance of the platform required for the operation of a specialized application. A misjudgment can result in either insufficient performance or wasted costs. The article considers a method for calculating processor performance for a specific computational task.

Alexey Kochengin

667

TO BOOKMARKS

Article
in the electronic version
«STA» No. 4 / 2020 p. 88

Article in PDF
772 KB

Introduction

In their daily work, the PROSOFT project office team faces new projects that require the development of a software and hardware structure. When developing the architecture of systems, there are many problems and limitations associated, for example, with the selection of equipment for placement in a metal rack, turnstile or terminal, which would meet a number of requirements:

technical requirements of the customer;
a form factor that would allow designers not only to place equipment in a limited space, but also to comply with ergonomic requirements;
availability of a modern interface for communication between equipment and a computer;
availability of SDK (Software Development Kit — software development kit) or API (Application Programming Interface — application programming interface) for integration into proprietary software. It can be said that not all manufacturers take this point seriously and provide only a closed user application as part of their products, cutting off any possibilities for third-party software integration, but the integration of equipment into third-party systems opens up an additional sales market that could well recoup the costs. for the development and description of the external API;
the presence of a calculator or just a computer that meets the requirements for performance in 24/7/365 mode and has passive cooling. This is an important point, since the use of fans in such systems leads to dust accumulation and, as a result, to an increase in the frequency and complexity of maintenance.

In systems architecture, almost all of these requirements are deterministic, requiring the system architect or engineer to find the right combination of all technical characteristics. However, in the design process, one characteristic that is difficult to formalize remains, which is difficult to calculate and predict, is computer performance. How to determine which processor to choose? How much RAM is required? But at 99% of the time at the stage of designing the hardware structure and drawing up specifications, we do not have information about how resource-intensive the software used will be. Will, for example, the video recognition system work without noticeable delays?

Identification of bottlenecks

In this matter, you always have to balance, choosing between a productive platform and price. Not always the project budget allows you to order an intel core i7 or i9 CPUthe latest generations and 16 GB of RAM and not worry about the fact that the installed software may not have enough resources. The question arises how to find the optimal ratio of performance and price.

Let’s figure it out. As a rule, a relative assessment of computer performance is carried out according to the characteristics of RAM, central and graphic processors (CPU and GPU).

Estimating the required amount of RAM is usually limited to manual testing of the software using the task manager to determine the average amount of RAM required when running the most resource-intensive tasks.

When choosing a CPU, you need to pay attention to two main characteristics that determine its relative performance:

frequency, which determines the number of operations per second on one core;
the number of physical cores within a single processor.

These specs do not take into account architecture differences (AMD vs. Intel, Haswel vs. Kaby Lake, etc.), but when comparing two processors, they are the main parameters that determine the relative performance of a processor. If the software you plan to use on your system uses only one processor core, then the processor frequency is a good and fairly accurate measure of performance.

However, if the software can use multiple cores, then it becomes difficult to evaluate the performance of the processor, since almost any application will not use all the cores as efficiently as possible. The question arises of evaluating the efficiency of parallelization of calculations by the software used.

Evaluation of processor performance for a specific application

There are several different ways to calculate the efficiency of parallel computing. In this article, we will use Amdahl’s law [1–3], which illustrates the limitation of system performance growth with an increase in the number of computers.

Returning to our task, we can say that from this law it follows that with an increase in the number of processor cores, there will be a slowdown in performance growth with low efficiency of using all cores simultaneously. If you know the parallelization efficiency, you can mathematically calculate the point at which performance is better with fewer cores running at a higher frequency than with more cores running at a lower frequency. In a simplified form, Amdahl’s law is as follows:

S(n) is the theoretical degree of software computing acceleration when using n processor cores (threads). Here p represents the part of the algorithm that can be distributed among multiple cores, that is, the efficiency of parallelization. Based on this, 1– p is that part of the program that can only be executed sequentially.

The simplest and fastest way to determine the effectiveness of parallelization is to conduct a series of test measurements. It is necessary to estimate the execution time of any cyclic software operation on a different number of cores. Don’t be afraid. For testing, it is enough to use software tools to change the number of cores involved. On Windows, to do this, you need to set the affinity for the process you need through the task manager (in the Linux family, you can use the taskset command).

Please note that if the processor supports Hyper-threading (hyper-threading), in fact, there will be twice as many cores in the list as it physically has. You can either disable Hyper-threading in the BIOS before running the test, or simply select two threads for each CPU core you want to test. Hyper-threading threads always appear immediately after the physical core in Windows, so you must select two cores for each physical CPU core you want to use in your program. In other words, choosing cores 1 and 2 will allow the program to use only one physical CPU core, choosing cores 1-4 will allow the program to use two physical CPU cores, and so on. Before measurements, it is recommended to disable the TurboBoost function in the BIOS. This will improve the accuracy of the results. On fig. 1 shows an example of setting affinity for processes through the Windows Task Manager.

Now, having the ability to limit the number of cores that your software uses, you can calculate how long the algorithm will run. For example, there is a program for recognizing scanned documents. You need to fix how long it will take to recognize and parse (Parsing — analysis) of all its fields with a different number of cores involved. In table. 1 in the first and second columns are examples of recording measurement results.

It should be noted that the affinity setting is set for a specific instance of the program. The next time you run the program, you will have to rebind. However, if you want to quickly test a single action using a different number of CPU cores, you don’t need to close the program before changing the number of cores involved — just click «Set Affinity» and change it while the program is running. However, you will get more accurate results by closing the program between runs of the algorithm under test, as this will clear the area of RAM allocated to the program.

In the next step, we will have data to determine the efficiency of parallelization p . To begin with, we can evaluate how much faster the program performed operations with n processor cores relative to the program execution time on one core. For example, for four cores, the execution time was 2.5 s, while for one core it was 5.2 s, then we get the calculated performance gain of 5.2 / 2.5 = 2.08.

For accurate calculation of parameter p the least squares method for a non-linear function can be used for the actual acceleration values, but this is a rather time-consuming and complex process. It suffices for us to make a less precise estimate. And the time spent by a highly qualified specialist on an accurate calculation may be more expensive than the budget savings when choosing the optimal solution.

There is a simple way to estimate with sufficient accuracy the efficiency of parallelization of computations. Assume that the parallelization efficiency is 50%. For this value, you need to calculate the performance gain using formula (1) and build a graph in any convenient package, such as Excel (Fig. 2).

From fig. 2, it can be seen that the actual acceleration with an increase in the number of cores is higher than that calculated according to the Amdahl law p = 50%, therefore, the parallelization efficiency should be higher. We experimentally select p = 70% (Fig. 3).

Having obtained the calculated efficiency of software parallelization on the processor under test, we can evaluate the performance of this processor and select potential candidates for system development.

First, you need to calculate the effective number of cores used, that is, the number of CPU cores on which the program runs with optimal performance.

To do this, you can again use formula (1), but in the calculations do not use the total number of physical and virtual cores, but only the number of physical cores.

Here efc is the effective number of cores and phc is the physical number of processor cores. Based on this, we can multiply the number of effective cores by the operating frequency of each CPU f to see how many floating point operations per second the CPU can perform. This value is usually called flops (FLOPS — FLoating-point Operations Per Second).

Flops is an off-system unit for evaluating the performance of computing systems. In this case, it doesn’t matter how many floating-point operations per cycle a particular processor performs: firstly, this value is absent when analyzing specifications for a CPU; secondly, based on the prerequisites for applying this method, it is supposed to compare processors of the same family or architecture. Now we can estimate what time T _perf will be required to perform the same operation on a potential processor that we tested in the previous step.

where min( RT ) is the minimum execution time of the operation under test, we obtained this value in the second column of the table. 1; cpu 1 — tested CPU; cpu 2 is the CPU being compared.

Let’s do an experimental evaluation. The results in the table. 1 were tested on an intel core i5 7200 processor. Let’s assume that we are not satisfied with the document scanning time of 2.5 seconds. Let’s consider how it will affect performance if we replace the processor with an older model intel core i7 7500. To do this, we will carry out calculations using formulas (2) and (3) for each processor and estimate how much the operation execution time will increase. These processors have the same number of effective cores, the differences are only in the frequency. Then it turns out:

Thus, when switching to core i7 of the same architecture, the time to scan a document will be reduced by about 0.3 s. It is up to the system architect to decide whether this optimization is acceptable and worth the extra cost for a slightly more powerful processor.

Terminals

Important remarks need to be made. This evaluation method assumes that the compared processors have a similar architecture. If you’re interested in a CPU that uses a completely different architecture, you can still use this method to determine the relative performance difference between several different CPU models in the same family, but it likely won’t accurately reflect actual performance. There are also other limitations to using this method. For example, remember that Amdahl’s Law only applies if the bottleneck is the CPU. If the task is not limited by processor resources, you will find that further increase in the number of cores will no longer affect the performance gain. If the performance of the graphics card, RAM, or hard drive does not allow the program to run faster, adding more CPU cores will never help, even if the program is running 100% in parallel.

The method described in the article is used by the project office team when conducting an initial assessment of the required equipment. The resulting estimate, of course, does not give accurate results, but it allows using formalized methods to calculate and justify the specification of the proposed system. We plan to partially automate this process. After all, you can collect statistics on the system resources used by a specific application, make simple calculations based on the described method (data on the technical characteristics of various platforms can be downloaded from open databases) and, based on the calculations, implement a decision-making system that will recommend the use of a specific processor to achieve the required performance . ●

Literature

Gustafson J.L. Amdahl’s Law // Encyclopedia of Parallel Computing. – Boston : Springer, 2011.
Batch M. Estimating CPU performance Using Amdahls Law. – USA : PugetSystems, 2015.
Popov G., Mastorakis N., Mladenov V. Calculation of the Acceleration of Parallel Programs as Function of the Number of Threads // Latest Trends on Computers.