Ibm power8 vs intel xeon: How IBM Stacks Up Power8 Against Xeon Servers

How IBM Stacks Up Power8 Against Xeon Servers

Since putting together the OpenPower Foundation two years ago, IBM and its partners have been working to get lower-cost Power8 machines into the field to better compete against the ubiquitous Xeon server platform. With the Power Systems LC machines announced last week, the gap is closing, at least according to the configurations that IBM is stacking up and the performance benchmarks that it has run.

IBM has thus far kept its competitive analysis to itself and its reseller partners, but The Next Platform has gotten its hands on what Big Blue is saying. We have reached out to Intel to get its thoughts on the new Power Systems LC machines and its own competitive analysis and will let you know what we learn.

It would be good to see both “Haswell” Xeon E3 v4 and E5 v3 machines tested and audited on a variety of benchmarks alongside Power8 machines by third parties. Customers should, as always, do precisely what Google is doing with Power8 machines, and that is run their own benchmarks using their own code. If IBM wants the Power platform to get 10 percent to 20 percent market share in the datacenter, as it has indicated is its goal, it is going to have to be a lot more aggressive about selling the architectural advantages and price competitiveness of its machines. The important thing is that IBM’s own competitive analysis shows that for certain kinds of workloads that are sensitive to memory bandwidth, a Power8-based machine is worthy of the hassle of a bake off, provided IBM will make machinery available for such tests.

As we detailed last week, IBM has launched a pair of new Power8 machines, code-named “Habanero” and “Firestone,” that have one or two Power8 sockets, respectively, and are positioned against Intel’s two-socket Xeon servers for data analytics, database, and HPC workloads, depending on the configuration. The Habanero system is called the Power Systems S812LC and was designed in conjunction with Tyan while the Firestone system is called the Power Systems S822LC and was made with the assistance of Wistron. (Both are server ODMs and both are hoping to get some leverage and potentially higher margins with Power-based servers.)

IBM has already been talking up the relative performance of Power8 systems compared to Xeon E5 machines with regard to Spark in-memory analytics workloads, as we discussed a month ago when IBM put out some benchmarks on the SparkBench suite of tests, which strain systems with a mix of streaming, SQL, machine learning, and graph analytics jobs. Back in June during the ISC 2015 supercomputing conference, IBM released some relative performance figures on a variety of HPC workloads pitting Haswells against the Power8s.

In its presentations, IBM says that the single-socket Habanero Power8 machine, with a maximum of 1 TB, has twice the main memory of a single-socket Xeon E5 machine in the Haswell generation and sixteen times that of a Broadwell Xeon E3. (IBM is using DDR3 memory, while the Xeons are using DDR4, which runs faster and cooler.) By IBM’s tests, it says that a Habanero machine with a single ten-core Power8 chip running at 2. 92 GHz can deliver the same Spark performance at less than half the cost of a two-socket Xeon E5-2690 v3 machine, which has 24 cores running at 2.6 GHz. IBM compared the Power S812LC and a Hewlett-Packard DL380 system, and says that the Habanero delivers 2.3X better bang for the buck on Spark work.

The presentations that IBM put together for reseller partners this comparison was based on the ten tests in the SparkBench suite, and that across those tests, the IBM machine deliver 1.94X the performance of the Xeon E5 machine. Both machines were running Ubuntu Server 15.04, OpenJDK 1.8, and Spark 1.4.

To give a more general sense of how the Power S812LC stacks up against the Xeon E5 machines and its predecessor Power S812L systems, IBM put together this comparison based on SPECin_Rate integer benchmark tests:

As you can see, IBM has dialed back the performance on the LC variant of the single-socket Power8 machine, but it has dialed back the prices even further because of the change in processor and the move to standard DDR3 memory from its own custom memory. In the comparison above, IBM is assuming a 15 percent discount on the Power Systems LC hardware and a 20 percent discount on the software to get that total cost of acquisition (TCA) street price. The earlier Power Systems L and HP ProLiant systems have a 20 percent discount on both hardware and software to get that street price.

Workhorse To Workhorse

The real comparison that most companies will want to make is for two-socket machines, the workhorses of the datacenter.

On the Firestone Power S822LC machine, which has two sockets, IBM focused on relational database workloads. Big Blue tested a Firestone machine with two eight-core Power8 chips running at 3.6 GHz and with 256 GB of main memory against an HP ProLiant DL380 with two of the eighteen-core Xeon E5-2699 v3 chips. On the pgbench Postgres database test, the Power S822LC did around 33,000 transactions per second (TPS) per core, while the Xeon E5 did around 12,000 TPS per core. IBM wanted to talk about per core performance because that is where the spread is greatest, but if you work the math backwards, the Power8 Firestone machine could do about 528,000 TPS across the system and the ProLiant DL380 could do about 432,000 TPS.

Both of these machines had 256 GB of memory and ran Red Hat Enterprise Linux 7.1 on top of the KVM hypervisor and PostgreSQL 9.5 Alpha2. The way IBM prices the machines up, based on list prices, the Firestone machine offered about 40 percent better bang for the buck than the HP machine.

We would point out that this eighteen-core chip is very expensive compared to other Xeon E5 chips and PostgreSQL may or may not be able to take advantage of all of those threads. It would be illustrative to see if a pair of twelve-core Power8 chips, with 192 threads, would do much better than the Power8 machine IBM tested with a total of sixteen cores and 128 threads. The Intel machine had 36 cores and 72 threads. As is the case with all software, performance on any given hardware configuration will be dependent on how the software is architected. Sometimes it can take advantage of the threads, cache, and memory, and sometimes it can’t. The trick is to figure out the optimal configuration for your own workload, and that takes time and money.

It would be fun to see how a “Broadwell” Xeon D chip aimed at hyperscalers would do against a Power8 LC machine with the same workload. (Give a mouse a cookie, he wants a glass of milk. . . .)

IBM also ginned up some SPECint_Rate benchmark comparisons for its old and new two-socket Power8 machines and the HP ProLiant DL380. Take a look:

Again, IBM is gearing back the memory bandwidth by half on the Power Systems LC machine compared to the Power L machine, and the clock speeds on the Power8 chips are also dialed back a bit. Therefore, the SPEC integer performance is also lower. But the cost of the Firestone Power Systems S822LC machine is so much lower that it matches the bang for the buck of a Xeon E5 machine with two of those eighteen-core Xeon E6-2699 v3 processors. The discounting to get the street price is the same as on the comparisons for the Power Systems S812LC machine.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

The IBM POWER8 Review: Challenging the Intel Xeon

by Johan De Gelason November 6, 2015 8:00 AM EST

  • Posted in
  • IT Computing
  • CPUs
  • Enterprise
  • Enterprise CPUs
  • IBM
  • POWER
  • POWER8

146 Comments
|

146 Comments

A Real Alternative? Challenging the XeonReading the BenchmarksSoftware IssuesTaking a Closer Look Inside IBM’s S822LInside the S822L: Hardware ComponentsThe L4-cache and Memory SubsystemBenchmark Configuration and Methodology»Per Core» Integer Performance: 7-ZipMulti-Threaded Integer PerformanceInfluence of the Compiler: IntegerFloating Point: RaytracingFloating Point & CompilersFloating Point: NAMDDatabase Performance: MySQLScale-Out Big Data Benchmark: ElasticSearchEnergy and PricingComparing Benchmarks & Closing Thoughts

Five years. That is how much time has passed since we have seen an affordable server processor that could keep up with or even beat Intel’s best Xeons. These days no less than 95% of the server CPUs shipped are Intel Xeons. A few years ago, it looked like ARM servers were going to shake up the market this year, but to cut a long story short, it looks like the IBM POWER8 chip is probably the only viable alternative for the time being.

That was also noticeable in our Xeon E7 review, which was much more popular than we ever hoped. One of the reasons was the inclusion of a few IBM POWER8 benchmarks. We admit that the article was however incomplete: the POWER8 development machine we tested was a virtual machine with only 1 core, 8 threads and 2 GB of RAM, which is not enough to do any thorough server testing.

After seeing the reader interest in POWER8 in that previous article, we decided to investigate the matter further. To that end we met with Franz Bourlet, an enthusiastic technical sales engineer at IBM and he made sure we got access to an IBM S822L server. Thanks to Franz and the good people of Arrow Enterprise Computing Solutions, Arrow was able to lend us an IBM S822L server for our testing.

A Real Alternative?

Some of you may argue that the POWER based servers have been around for years now. But the slide below illustrates what we typically associated IBM’s POWER range with:

Proudly, the IBM sales team states that you can save 1.5 million dollars after you have paid them 2 million dollars for your high-end 780 system. There is definitely a market for such hugely expensive and robust server systems as high end RISC machines are good for about 50,000 clients. But frankly for most of us, those systems are nothing more than an expensive curiosity.

Availability can be handled by software and most of us are looking/forced to reduce our capital expenses rather than increase them. We want fast, «reliable enough» servers at low costs that are easy to service. And that is exactly the reason why the single and dual sockets Xeon servers have been so popular the past decade. Can an IBM POWER server be a real alternative to the typical Xeon E5 server? The short but vague answer: a lot has changed in the past years and months. So yes, maybe.

Challenging the Xeon
A Real Alternative? Challenging the XeonReading the BenchmarksSoftware IssuesTaking a Closer Look Inside IBM’s S822LInside the S822L: Hardware ComponentsThe L4-cache and Memory SubsystemBenchmark Configuration and Methodology»Per Core» Integer Performance: 7-ZipMulti-Threaded Integer PerformanceInfluence of the Compiler: IntegerFloating Point: RaytracingFloating Point & CompilersFloating Point: NAMDDatabase Performance: MySQLScale-Out Big Data Benchmark: ElasticSearchEnergy and PricingComparing Benchmarks & Closing Thoughts

PRINT THIS ARTICLE

Power8 vs x86 | open systems. DBMS

With the release of the Power8 processor and the formation of the OpenPower ecosystem, the Blue Giant has made a serious claim to its place in the space of solutions that meet the current trends in mobility, the popularity of social networks and the ubiquity of clouds.

The success of the processor architecture today depends on the ecosystem it created [1]: for example, one of the reasons for the longevity of x86 is the gigantic ecosystem that has developed over the 35 years of this architecture’s existence. Until recently, the ARM architecture, which remained widely known in narrow circles, is only a few years younger, but only the mobile revolution allowed it to go beyond embedded systems and compete with x86. The Power architecture came even later, and its ecosystem is smaller, and attempts to use it outside of IBM are few and far between. In 2013, recognizing the changes taking place and following the path of ARM, IBM opened up the licensing of its new Power8 processor, which has the potential to seriously compete with x86, in the hope of creating a more powerful ecosystem.

The appearance of Power8 makes us return to the discussion “RISC vs. CISC”, which unfolded in the second half of the 90s and ended with the victory of x86 — DEC Alpha, MIPS, Motorola 88000, PA-RISC and many other lesser-known processor architectures left the scene, not counting SPARC , which, although it occupies a considerable, but still niche place. Only Power and ARM remained successful successors to the rather promising direction of RISC — the first was used in IBM products, and the second led a hidden lifestyle in a huge number of embedded devices. Recognizing leadership for x86, with the appearance of each version of Power microprocessors, IBM certainly compared them with x86 models, showing the advantages. However, both architectures did not compete, but peacefully coexisted in different segments.

No matter how one explains the success of x86, one not very successful and constantly modernized architecture cannot remain a hegemon for decades, spreading from netbooks to supercomputers — sooner or later something must happen. Most likely, Power8 is the first serious application for the role of x86 competitor. Supporting this claim is IBM’s sale of its Intel-based System x server business, which, like PCs, was taken over by Lenovo in 2005. To replace System x, the Blue Giant offers new servers from the Power Systems family with an “S” index indicating their ability to scale out (ScaleOut). It is fair to assume that scale-in enterprise servers will soon appear, and their index will most likely be «E».

OpenPower — Linux based Power

Compared to other RISC microprocessors, Power is in a better position, but the general trends are not very favorable for it, and if special measures are not taken, then the segment of their application will inevitably decrease, primarily due to the general decline in the popularity of the Unix operating system. The share of Unix servers is declining — according to Gartner analysts, in 2013 it was 16%, and by 2017 it will fall below 9%, and this is according to optimistic estimates. Analysts at IDC recorded a 31.3% decline in sales of Unix servers in the fourth quarter of 2013. True, the Unix crisis affected IBM less than others — from 2002 to 2012, the corporation’s share in this segment increased from 14 to 55%, but this happened due to the fall of other manufacturers, which in general does not promise anything good.

The corporation sees the way out in the open development strategy of OpenPower, taking its own example of opening the circuits and BIOS of the IBM 5150, which made it possible to talk about the IBM PC standard, which formed the colossal PC market. The second example of development in this direction is the rise in popularity of ARM, which became the result of the discovery of the specifications for the Advanced RISC Machine architecture. The OpenPower Coalition is somewhere in between these endeavors. It assumes closer technological integration than in the case of the IBM PC, but at the same time, the release of competing products by licensees, which clearly contributes to development. In addition, OpenPower is supported by the technological solutions embedded in Power8 — in particular, the Coherent Accelerator Processor Interface (CAPI) memory interface accelerator, which allows integrating chips from different manufacturers at the board level.

The OpenPower Consortium was formed in August 2013 to create an ecosystem for the development of the Power-based Linux operating system. The first major member of the OpenPower coalition was Google, and today it includes about fifty organizations, and a significant part of them from China.

The first member of OpenPower to announce its intention to produce processors was the Chinese startup Suzhou PowerCore, created in 2013, which plans to design its version of Power8 in two years and begin production at an IBM factory in the United States. Over time, production may be moved to China. PowerCore is one of the six processor projects of the Chinese Academy of Sciences, of which the most famous are Loongson, a variation on the MIPS theme used in High Performance Computing (HPC) clusters [2], and FeiTeng, a SPARC clone. Another Chinese company, China Core Technology, licensed the IBM PowerPC instruction set in 2010 and is building systems on a chip (System-on-Chip, SOC), it also has its own plans to use Power8. The modest number of people interested in making processors is most likely due to the fact that most of the SOC manufacturers that could be attracted by the OpenPower idea have already chosen licenses for ARM 64-bit V8. Thanks to the CAPI interface, systems on a chip will be released by companies such as Nvidia, Altera, Suzhou PowerCore, Xilinx and VeriSilicon; I/O systems, memory and network modules will be manufactured by Tyan, Chuanghe Telco Tech, Servergy, Inspur and ZTE, while application software will be produced by Teamsun, Google and Juelich.

Through the OpenPower ecosystem, IBM creates a seamless infrastructure due to the fact that all participants in the competitive market receive a license — it was this scheme that once contributed to the creation of the PC market, and it is significant that the Chinese company Power Core already today offers servers of its production at a price lower than IBM itself.

For many years, IBM has been and remains «number one» in the field of complex professional systems from mainframes to Watson, but new trends are emerging, and above all private and public clouds, giant data centers designed to run data-processing applications. These systems are dominated by servers running Linux and Windows, which form the basis of scale-out systems that are usually not serviced and simply replaced if they fail. After AMD left the market for high-performance standard architecture systems, Intel became a monopoly, and servers released under the ARM license do not threaten Xeon E7, E5 and only slightly pose a threat to E3 — this is exactly the niche that IBM is counting on when deploying the ecosystem open power.

However, the success of IBM’s initiative directly depends on the «Google factor» — if this manufacturer gives the go-ahead to transfer a significant part of its load to Power, then a bright future for Power8 is provided. At the same time, some observers express doubts about this, noting that the initiative is «too small and too late», but there is hope — it is backed by the presence of IBM’s «secret weapon» CAPI. The qualitative novelty of CAPI is that it opens up the possibility of creating heterogeneous systems at the motherboard level, namely, heterogeneity is a characteristic feature of modern cloud data centers, which are equipped with almost all types of computers available today. Inside, all these servers are homogeneous and are built only on conventional central processing units (CPUs), and CAPI opens up the possibility of supplementing them with accelerators, primarily graphics processing units (GPUs) from Nvidia, Altera and Xilinx. The work [3] is devoted to the role of accelerators for improving the performance of HPC clusters; additional devices can have approximately the same effect on the data center. Thanks to CAPI, IBM has no competition at the heterogeneous motherboard level, since Intel does not offer anything like this, and AMD is focusing on Kaveri heterogeneous processors (APUs).

Features Power8

The Power8 microprocessor includes caches of different levels, PCI-Express controllers and DDR memory, numerous accelerators that increase the performance of each core and the entire system as a whole. The cores are connected to the memory using NUMA technology, which provides distributed access, including for several processors installed on the board.

In the maximum configuration, Power8 consists of 12 cores (there are eight in Power7), which, unlike their predecessor, operate in single-threaded mode and three hardware modes of simultaneous multithreading (Simultaneous Multithreading) SMT2, SMT4 and SMT8, dividing the core into two, four or eight threads. Each SMT thread is a logical processor capable of running one of three operating systems—Linux, AIX, or IBM i (formerly OS/400). A significant increase in cache size by one and a half times improves performance in single-threaded mode compared to Power7 +, despite the lower clock speed. How effectively the potential of threading is used depends on the capabilities invested in the software — the listed operating systems use different mechanisms for managing threads and logical partitions. In terms of numerical indicators, Power8 (area — 650 sq. mm, manufacturing process — 22 nm, number of transistors — 4.2 billion) differs markedly from Power7 + (567, 32 and 2.1, respectively). The kernel consists of the following nodes: two load stores (Load Store Unit, LSU), a condition register (Condition Register Unit, CRU), a branch register (Branch Register Unit, BRU), command selection devices (Instruction Fretch Unit, IFU), two fixed-point arithmetic units (Fixed-Point Unit, FXU), two vector units (Vector Math Unit, VMX), a floating-point arithmetic unit (Decimal Floating Unit, DFU) and one cryptographic unit (Cryptographic Unit).

The memory controller, figuratively named Centaur, has no analogues — it combines two functions, partly playing the role of a fourth-level cache (L4) and the controller itself. In the current implementation, Centaur is adapted to work with DDR3 memory, but it has the potential to migrate to DDR4 in the future. This approach is called «neutral» (technology-neutral memory controller), and the need for it is associated with different processor and memory refresh cycles, the processor refresh cycle is usually two years, and memory five. The exchange and cache control logic is placed in a separate Centaur chip, it is connected to the processor by a bus with an exchange rate of 9.6 GB/s and with a latency of 40 ns, each processor can support up to eight of these chips, which allows you to increase the L4 size to 128 MB. Orientation to work with large amounts of data is manifested in high exchange rates between processors and memory — each socket exchanges eight channels with a total steady-state speed of up to 230 Gb / s and can support memory up to 1 TB. The work of memory is organized on transactional principles.

The built-in PCI-Express 3.0 port controller allows you to organize a transport layer — CAPI, which opens up the possibility of easily connecting accelerators directly to the PCI bus, which greatly simplifies the integration process. CAPI allows direct data exchange through memory between Power8 and GPUs, FPGAs, DSPs connected to it; the presence of CAPI allows these accelerators to «feel» as if they were placed directly on the chip.

Power8 based servers

The names of the first Power8-based server models contain symbols: «S» — horizontal scaling, «8» — belonging to the Power8 family; and the last two digits show the number of sockets and the height in U. Servers can run two or three of the listed OSes, and if an «L» character is found, that server only runs Linux. 2U non-L servers can run AIX and Linux, and 4U servers can also run IBM i.

Servers in a 2U design can have a different configuration depending on whether both sockets are used or not. The single socket version (S812) has 6 or 10 cores and a maximum memory size of 512 GB. It has six low-profile PCIe Gen3 adapters. With both sockets, the number of cores and memory size doubles to 12 and 20 and 1TB respectively, and the number of adapters increases to 9. The S822L is a Linux-only model and can be powered by PowerVM or PowerKVM hypervisors. The 4U servers S814 and S824 differ in the number of cores, memory size, and the number of PCIe Gen3 adapters. In the first case, the number of cores is 6 or 8, in the second, 12, 16, or 24, the memory is 500 GB or 1 TB, the number of adapters is 7 or 11. Both can work under all three operating systems.

The use of the PowerKVM hypervisor opens up opportunities to attract a wide range of open source software developers. The original Kernel-based Virtual Machine (KVM) is an open source, enterprise-grade hypervisor that provides the performance, reliability, and scalability needed to run heavy workloads on Windows and Linux, while being significantly more cost effective than other commercial x86 hypervisors.

Servers based on Power8 are following the current trends very closely — they are less designed for computing and more and more suitable for data processing tasks (memory sizes never seen before, high I / O speeds, multi-core and multi-threading). But, perhaps, the main thing to consider is the addition of the time-tested practice of vertical scaling with a new horizontal scaling space for IBM. What used to be associated with mass market servers is now being implemented on highly professional equipment with appropriate performance indicators — in an environment where labor costs more than equipment, this circumstance becomes critical.

The strengths of the new servers are: high application execution performance from IBM; traditionally high quality of support; market potential of various accelerators connected via CAPI; Chinese influence — the presence of unnamed analogues (white boxes) of good quality, but cheaper than famous analogues. However, there are also problems: a small number of partners capable of producing systems on a chip, large sizes, high cost of the processor, lack of experience in building hyperscale systems, the presence of a more developed x86 ecosystem and the ARM consortium, which already has a three-year history.

***

As it has been repeatedly, history repeats itself — when the Blue Giant, due to its mass, does not immediately respond to changes and at the initial stage of the formation of a particular market segment lets other players go ahead, but, realizing profitability, mobilizes serious forces. Sensing the potential of clouds, mobility and social networking, IBM introduced the Power8 processor and the OpenPower ecosystem, which together promise a lot of great things for the market.

Literature

  1. Sergey Avdoshin, Elena Pesotskaya. Mobile Ecosystems // Open Systems.DBMS. — 2014. — No. 2. — S. 32–34. URL: http://www.osp.ru/os/2014/02/13040044 (accessed 06/18/2014).
  2. Dmitry Volkov. Strategic IT: Chinese Surprise No. 863 // Open Systems.DBMS. — 2010. — No. 3. — S. 32–37. URL: http://www.osp.ru/os/2010/03/13001879 (Accessed 06/18/2014).
  3. Leonid Chernyak. HPC — the end of the Cretaceous // Open Systems.DBMS. — 2010. — No. 3. — P. 15–19. URL: http://www.osp.ru/os/2010/03/13001851 (Accessed 06/18/2014).

Leonid Chernyak ([email protected]) — scientific editor, Open Systems. DBMS (Moscow).

Intel introduced 24-core processor for servers

Digitization
Technology

|

Share

    Intel has increased the number of cores in server processors to 24. This is the number that the chips from the new Xeon E7 v4 family contain. The manufacturer claims that the flagship processor of the family is 1.4 times faster than its analogue from the IBM Power8 line.

    24-core processors

    Intel introduced a new family of server processors, the Xeon E7 v4 (codenamed Broadwell-EX). It includes seven chips with the number of cores from 4 to 24 pieces. The previous generation chips announced in 2015 (Xeon E7 v3) had a maximum of 18 cores.

    Family member

    The family includes 24-core Xeon E7-8890 v4, 2.2GHz, 60MB cache, 165W; E7-8800 group with a frequency of 2.1 to 3.2 GHz, 45-60 MB of cache and a power consumption of 140 to 165 W.

    Also included in the announcement are the Xeon E7-4800 v4 chips, which have eight to 16 cores, 20-40 MB of cache, and a power consumption of 115 watts.

    Purpose

    Xeon E7 chips are designed for mission-critical applications. They are suitable for maintaining DBMS, virtual machines, business applications. Customers of previous generations of Xeon E7, for example, are financial institutions that use them to verify transactions and protect against real-time attacks.

    Intel Xeon E7 v4

    Server configurations

    24-core Xeon E7 processors can be installed in 4-8-socket servers, and thus provide the ability to get up to 192 cores in one system. The maximum amount of RAM supported in this case reaches 24 TB.

    The number of Russian companies using cloud infrastructure has tripled

    Digitalization

    So far, about 18 server manufacturers have announced plans to use the new chips. Among them are Lenovo, Dell, SGI and Fujitsu. The last two intend to offer 32 socket servers with a total of 768 cores.

    Intel reported that a server with 192 cores, 2 TB of RAM and two hard drives will cost the customer approximately $165,000.

    Competition with IBM Power8

    Intel believes that Xeon E7 v4 can compete with IBM Power8 server chips. For example, the flagship processor of the E7-8890 v4 family offers 1.4 times higher performance compared to the Power8 counterpart at half the operating costs in an 8-socket system, Intel said. At the same time, per dollar spent E7-8890 v4 offers 10x better performance than the competition, the companies added.