Xeon e5 review: Testing Broadwell-EP With Demanding Server Workloads

Testing Broadwell-EP With Demanding Server Workloads

by Johan De Gelason March 31, 2016 12:30 PM EST

  • Posted in
  • CPUs
  • Intel
  • Xeon
  • Enterprise
  • Enterprise CPUs
  • Broadwell

112 Comments
|

112 Comments

Broadwell Reaches Xeon E5Broadwell-EP: The 14nm Xeon E5Broadwell Architecture ImprovementsSharing Cache and Memory ResourcesTSX and Faster VirtualizationXeon E5 v4 SKUs and PricingBenchmark Configuration and MethodologySingle Core Integer Performance With SPEC CPU2006Memory SubsystemMulti-Threaded Integer PerformanceDatabase performanceApplication Development: Linux Kernel CompileSAP S&D 2-tierApache Spark 1.5: The Ultimate Big Data CruncherSpark BenchmarkingHPC: Fluid Dynamics with OpenFOAMNAMDClosing Thoughts

We have been spoiled. Since the introduction of the Xeon «Nehalem» 5500 (Xeon 5500, March 2009), Intel has been increasing the core counts of their Xeon CPUs by nearly 50% almost every 18 months. We went from four to six (Xeon 5600) on June 2010. Sandy Bridge (Xeon E5-2600, March 2012) increased the core count to 8. That is only 33% more cores, but each core was substantially faster than the previous generation. Ivy Bridge EP (Xeon E5-2600 v2, launched September 2013) increased the core count from 8 to 12, the Haswell-EP (Xeon E5-2600 v3, sept 2014) surprised with an 18-core flagship SKU.

However it could not go on forever. Sooner or later Intel would need to slow down a bit on adding cores, for both power and space reasons, and today Intel has finally pumped the brakes a bit.

Launching today is the latest generation of Intel’s Xeon E5 processors, the Xeon E5 v4 series.Fifteen months after Intel’s Broadwell architecture and 14nm process first reached consumers, Broadwell has finally reached the multi-socket server space with Broadwell-EP. Like past EP cores, Broadwell-EP is the bigger, badder sibling of the consumer Broadwell parts, offering more cores, more memory bandwidth, more cache, and more server-focused features. And thanks to the jump from their 22nm process to their current-generation 14nm process, Intel gets to reap the benefits of a smaller, denser process.

Getting back to our discussion of core counts then, even with the jump to 14nm, Intel has played it more conservatively with their core counts. Compared to the Xeon E5 v3 (Haswell-EP), Xeon E5 v4 (Broadwell-EP) makes a smaller jump, going from 18 cores to 24 cores, for an increase of 33%. Yet even then, for the new Xeon E5 v4 «only» 22 cores are activated, so we won’t get to see everything Broadwell-EP is capable of right away.

Meanwhile the highest (turbo) clockspeed is still 3.6 GHz, base clocks are reduced with one or two steps and the core improvements are very modest (+5%). Consequently, performance wise, this is probably the least spectacular product refresh we have seen in many years.

But there are still enough paper specs that make the Broadwell version of the Xeon E5 attractive. It finds a home in the same LGA 2011-3 socket. Few people will in-place upgrade from Xeon E5 v3s to Xeon E5 v4s, but using the same platform means less costs for the server vendors, and more software maturity (drivers etc.) for the buyers.

They look very different but fit in the same socket: Xeon E5 v4 on top, Xeon E5 v3 at the bottom

Broadwell also has several features that make it a more attractive processor for virtualized servers. Finer granular control over how applications share the uncore (caches and memory bandwidth) to avoid scenarios where low priority applications slow down high priority ones. Meanwhile quite a few improvements have been made to make the I/O intensive applications run smoother on top of a virtualized layer. Most businesses run their applications virtualized and virtualization is still the key ingredient of the fast growing cloud services (Amazon, Digital Ocean, Azure. ..), and more and more telecom operators are starting to virtualized their services, so these new features will definitely be put to good use. And of course, Intel made quite a few subtle — but worth talking about — tweaks to keep the HPC (mostly «simulation» and «scientific calculation software) crowd happy.

But don’t make the mistake to think that only virtualization and HPC are the only candidates for the new up-to-22-cores Xeons. The newest generation of data analytics frameworks have made enormous performance steps forward by widening the network and storage bandwidth bottlenecks. One example is Apache Spark, which can crunch through terabytes of data much more efficiently than its grandparent Hadoop by making better use of RAM. To get results out of a massive hump of text data, for example, you can use some of most advanced statistical and machine learning algorithms. Mix machine learning with data mining and you get an application that is incredibly CPU-hungry but does not need the latest and fastest NVMe-based SSDs to keep the CPU busy.

Yes, we are proud to present our new benchmark based upon Apache Spark in this review. Combining analytics software with machine learning to get deeper insights is one of the most exciting trends in the enterprise world. And it is also one of the reason why even a 22-core Broadwell is still not fast enough.

Broadwell-EP: The 14nm Xeon E5
Broadwell Reaches Xeon E5Broadwell-EP: The 14nm Xeon E5Broadwell Architecture ImprovementsSharing Cache and Memory ResourcesTSX and Faster VirtualizationXeon E5 v4 SKUs and PricingBenchmark Configuration and MethodologySingle Core Integer Performance With SPEC CPU2006Memory SubsystemMulti-Threaded Integer PerformanceDatabase performanceApplication Development: Linux Kernel CompileSAP S&D 2-tierApache Spark 1.5: The Ultimate Big Data CruncherSpark BenchmarkingHPC: Fluid Dynamics with OpenFOAMNAMDClosing Thoughts

PRINT THIS ARTICLE

Closing Thoughts — The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads

by Johan De Gelason March 31, 2016 12:30 PM EST

  • Posted in
  • CPUs
  • Intel
  • Xeon
  • Enterprise
  • Enterprise CPUs
  • Broadwell

112 Comments
|

112 Comments

Broadwell Reaches Xeon E5Broadwell-EP: The 14nm Xeon E5Broadwell Architecture ImprovementsSharing Cache and Memory ResourcesTSX and Faster VirtualizationXeon E5 v4 SKUs and PricingBenchmark Configuration and MethodologySingle Core Integer Performance With SPEC CPU2006Memory SubsystemMulti-Threaded Integer PerformanceDatabase performanceApplication Development: Linux Kernel CompileSAP S&D 2-tierApache Spark 1. 5: The Ultimate Big Data CruncherSpark BenchmarkingHPC: Fluid Dynamics with OpenFOAMNAMDClosing Thoughts

With the limited amount of time we had to spend with the new Broadwell-EP Xeons ahead of today’s embargo, we spent most of our time on our new benchmarks. However we did a quick check on power as well. It looks like both idle power and load power when running a full floating point workload have decreased a little bit, but we need to do a more extensive check to further confirm and characterize this.

Meanwhile, considering what a wonderful offering the Xeon E5-2650L v3 was, it is a pitty that Intel did not include such a low power SKU among our samples for review.  The Xeon E5-2699 v4 is a solid product, but it’s not a home run. Either this is just an hiccup of our current setup (firmware?), but it seems the new Xeon E3 v4s do not reach the same turbo speeds as our Xeon E5 v3s. As a result, single threaded performance is (sometimes) slightly slower, and the new processor needs more cores to beat the previous one.

We noticed this mostly in the HPC applications, where the new Xeon is a bit of mixed bag. Still, considering that 72 to 88 threads are a bit much for lots of interesting applications (Spark, SQL databases…) there is definitely room for processors that sacrifice high core counts for higher single threaded performance (without exagerating). We have been stuck at 3.6 GHz for way too long.

With that said, there is little doubt that the Xeon E5-2699 v4 delivers in the one application that matter the most: virtualization.

Although we have not yet extensively tested on top of an hypervisor, we are pretty sure that the extra cores and the lower VMexit latencies will make this CPU perform well in virtualized environments. Intel’s resource director technology and many improvements (posted interrupts) that help the hypervisor to perform better in I/O intensive tasks are very attractive features.

Although it is not much, as compared to the Haswell-EP based Xeon E5 v3s, performance has also increased by about 20% in key applications such as databases and ERP applications. And while we can complain all we want about the slightly regression in single threaded performance in some cases, the fact of the matter is that Intel has increased performance by 2 to 2.7 times in four years in those key applications, all the while holding power consumption at more or less the same. In other words, it will pay off to upgrade those Sandy Bridge-EP servers. And for many enterprises, that is what matters. 

NAMD

Broadwell Reaches Xeon E5Broadwell-EP: The 14nm Xeon E5Broadwell Architecture ImprovementsSharing Cache and Memory ResourcesTSX and Faster VirtualizationXeon E5 v4 SKUs and PricingBenchmark Configuration and MethodologySingle Core Integer Performance With SPEC CPU2006Memory SubsystemMulti-Threaded Integer PerformanceDatabase performanceApplication Development: Linux Kernel CompileSAP S&D 2-tierApache Spark 1.5: The Ultimate Big Data CruncherSpark BenchmarkingHPC: Fluid Dynamics with OpenFOAMNAMDClosing Thoughts

PRINT THIS ARTICLE

16 cores and new frustration / Processors and memory

Cooling

The processor does not impose special requirements on the cooling system, in normal use you can limit yourself to a simple horizontal cooler with a copper heel. Nevertheless, if long-term high loads are planned, it is still better to give preference to inexpensive tower coolers. Models with 2-3 heat pipes will be enough to keep the temperature far from critical. More powerful cooling systems can be taken if an upgrade to hotter CPUs is planned in the future.

Proven coolers

2 heat pipes:

  • AeroCool Air Frost 2
  • Cooler Master Hyper T200

3 heat pipes:

    90 099 PCcooler GI-X3 V2
  • ID-COOLING SE-223

4 heatpipes :

  • ID-COOLING SE-224-XT
  • PCcooler GI-X4 v2
  • Deepcool GAMMAXX S40
  • Zalman CNPS10X Optima II

From models available on aliexpress, Snowman MT4 is considered a good option, but there is also a six-pipe model costing almost the same.

Revisions

There are as many as 5 revisions of this model, and 2 of them are final, although they belong to different steppings. Determining the version is easy — just look at the code printed on the cover.

Engineering or qualification (stepping) Final (stepping)
QBF1 (C1) QBV3 (C2) ) QB7X (?) SR0H6 (C1) SR0KV (C2 )

C1 stepping does not support VT-d virtualization and Trusted Execution Technology.

2023 © All rights reserved