Intel xeon 5500 performance: Intel Xeon X5550 Benchmarks — Geekbench Browser

PRESS KIT — Intel® Xeon® Processor 5500 Series

PRESS KIT — Intel® Xeon® Processor 5500 Series



















PRESS KIT — Intel® Xeon® Processor 5500 Series

















Press Releases




Internet: Meet Your New Processor


 

 

Press Materials

Featured Photography




Product Logos and Images ›
View and download product logos, high-resolution product and usage images, and 3rd Party product images.

 

 

3rd Party Releases



















































 

 

 






Think you know how to make your data center more energy efficient? Are you an expert at slashing data center operational costs? Share your ideas and have a chance at winning a sweet reward from Intel!

 

 

Video Selections




Why Servers Rock! ›
Find out how in this fun look at why servers rock! Also view other videos of the event.

 

 

Related Links








Intel® Microarchitecture (Nehalem)
Overview


Technical Documentation for the Intel®
Xeon® Processor 5000


Technical Documentation for the Intel®
5500 and 5520 Chipsets


Intel® Xeon® Server Performance
Benchmarks


Intel and major software vendors deliver
outstanding Intel Xeon Processor 5500
series-based platforms



Chip Shot: Intel Embedded in Future
Communications Networks


Chip Shot: Intel Challenge: Energize Your
Data Centers


Intel® Turbo Boost


Intel® Hyper Threading


Intel® Virtualization Technology


High-K Metal Gate Technology



HPC Clustering Intel® Cluster Ready




Intel Server Room


 

 

Intel Xeon 5500 Series CPUs Processors for Dell PowerEdge R710 Servers


The Dell PowerEdge R710 server will accept up to two (2) Intel Xeon 5500 series processors.

Designed for industry-leading performance and maximum energy efficiency, the Intel Xeon processor 5500 sequence delivers versatile one-way and two-way 64-bit multi-core servers and workstations that are ideally suited for a wide range of infrastructure, cloud, high-density, and high-performance computing (HPC) applications.

The Intel Xeon processor 5500 series offers up to nine times faster performance over single-core servers. It automatically places CPUs and memory into an optimal power state for maximum performance, while reducing energy use over single-core servers. Users can also mix different generations of Intel Xeon processors within the same pool, achieving a 9:1 server consolidation ratio, while reducing operating costs up to 90 percent, resulting in an estimated eight month ROI.

Selective processors in the Intel Xeon 5500 series also support Turbo Mode. Turbo Mode is an OS-controlled operation that automatically allows the processor to run faster than the marked frequency if the processor is operating below power, temperature, and current limits.

Unless otherwise noted, processors have been tested and removed from working servers and are backed by our 90-day parts replacement warranty. Heatsinks are sold separately.

Click a processor below for more information, availability and pricing.





















Intel Xeon 5500 Series Processors (Gainestown)
Model
number
Intel sSpec
number

Frequency

(GHz)

Turbo Cores

L2
cache

(per core)

L3
cache
(QPI) TDP
Dual Core
Xeon E5502

  • SLBEZ
1. 87 GHz No 2 256 KB 4 MB 4.8 GT/s 80 W
Xeon E5503

  • SLBKD
2.0 GHz No 2 256 KB 4 MB 4.8 GT/s 80 W
Dual Core, low power
Xeon L5508

  • SLBGK
2. 0 GHz Yes 2 256 KB 4 MB 4.8 GT/s 38 W
Quad Core
Xeon E5504

  • SLBF9
2.0 GHz No 4 256 KB 4 MB 4.8 GT/s 80 W
Xeon E5506

  • SLBF8
2. 13 GHz No 4 256 KB 4 MB 4.8 GT/s 80 W
Xeon E5507 SLBKC 2.27 GHz No 4 256 KB 4 MB 4.8 GT/s 80 W
Xeon E5520 SLBFD 2.27 GHz Yes 4 256 KB 8 MB 5. 86 GT/s 80 W
Xeon E5530 SLBF7 2.4 GHz Yes 4 256 KB 8 MB 5.86 GT/s 80 W
Xeon E5540 SLBF6 2.53 GHz Yes 4 256 KB 8 MB 5.86 GT/s 80 W
Xeon X5550 SLBF5 2. 67 GHz Yes 4 256 KB 8 MB 6.4 GT/s 95 W
Xeon X5560 SLBF4 2.8 GHz Yes 4 256 KB 8 MB 6.4 GT/s 95 W
Xeon X5570 SLBF3 2.93 GHz Yes 4 256 KB 8 MB 6. 4 GT/s 95 W
Quad Core, low power
Xeon L5506

  • SLBFH
2.13 GHz No 4 256 KB 4 MB 4.8 GT/s 60 W
Xeon L5518

  • SLBFW
2.13 GHz Yes 4 256 KB 8 MB 5. 86 GT/s 60 W
Xeon L5520

  • SLBFA
2.27 GHz Yes 4 256 KB 8 MB 5.86 GT/s 60 W
Xeon L5530

  • SLBGF
2.4 GHz Yes 4 256 KB 8 MB 5. 86 GT/s 60 W

Sort By:
Featured ItemsNewest ItemsBest SellingA to ZZ to ABy ReviewPrice: AscendingPrice: Descending

Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX)

Parent

Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX) Performance

Intel launched the Xeon 5600 series (Westmere-EP, 32nm) six-core processors
on 16 March 2010 without any TPC benchmark results.
In the performance world, no results almost always mean bad or not good results.
Yet there is every reason to believe that the Xeon 5600 series with six-cores
(X models only) will performance exactly as expected for a 50% increase
in the number of cores at the same frequency (as the 5500)
with no system level bottlenecks.
The expectation is that a six-core Xeon 5600 should provide 30%+ improvement
over the comparable quad-core Xeon 5500
in throughput oriented tests, particularly OLTP type workloads.
Single stream parallel execution plans will probably show less gain,
as scaling via parallelism is not a simple matter.

Then two weeks later on 30 March 2010,
Intel launched the Xeon 7500 series 8-core processors
for 4-way+ systems (and the Xeon 6500 for high-end 2-way systems)
with TPC-E results on 4-way and 8-way systems but no TPC-H results.
The TPC-E results were exactly what Intel said it was going to be last September at IDF,
2.5X over the previous generation Xeon 7400 series
and 2.5X over the contemporary 2-way Xeon 5500 series.


My guess is that Intel wanted it to be clear that the 4-way Xeon 7500
achieved the stated performance objectives of 2.5X over the 2-way Xeon 5500,
just in case some slide-decks did not mention which 2-way system
the 2.5X claim referred to.
Of course, the Intel statement of 2.5X for Xeon 7500 was most probably
based on performance measurements already run on proto-type systems.
It was probably also felt that the Xeon 5600 series is such a natural choice
to supersede the 5500 series that TPC benchmarks were not essential,
as there were sufficient other benchmarks to support the claims.

Benchmark Omissions


Earlier, I had commented about benchmark omissions from the quad-core generation on.
Below is a summary of processors and systems for which TPC results are published.
(The Intel Xeon 7500 Processor Product Brief shows 3.03X relative to 7400 for
OLTP Brokerage Database, which is TPC-E, but 2022 over 729 is 2.77X.

see Performance Benchmarks for updates.)









Processor
Architecture
Process
TPC 2-way 4-way 8-way 16-way
Core2 65nm
Xeon 5300 QC
7300 QC
TPC-C
TPC-E
TPC-H
251,300
5160 only
17,686@100
407,079
479. 51
34,990@100
841,809
804.0
46,034@300

1,250.0
Barcelona
65nm
QC
TPC-C
TPC-E
TPC-H


471,883



52,860@300


Core2 45nm
Xeon 5400 QC
7400 6C
TPC-C
TPC-E
TPC-H
275,149
317.45
634,825
729.65
Linux DB2
1,165.56

2,012.8 (R2)
102,778@3T
Shanghai 45nm
QC
TPC-C
TPC-E
TPC-H


579,814
635.4


57,685@300G


Istanbul 45nm
6C
TPC-C
TPC-E
TPC-H






91,558@300G*


Nehalem 45nm
Xeon 5500 QC
7500 8C
TPC-C
TPC-E
TPC-H
661,475†
850. 0
51,086@100G

2,022.64

3,141.76
162,601@3TB


Westmere 32nm
Xeon 5600 6C
7600 ?C
TPC-C
TPC-E
TPC-H
803,068
1,110
future
future
future
future
future
future
future
future
future
Magny-Cours
45nm
12C
TPC-C
TPC-E
TPC-H
705,652
887.4
1,193,472
1,464
107,561@300G
n/a
n/a
n/a
n/a
n/a
n/a

* and SF 1TB report

† Xeon W5580 3.2GHz, versus X5570 2.93GHz

Magny-Cours will not support >4 socket systems


In brief, the Intel Core 2 architecture processors were avoiding comparisons against
AMD Opteron in TPC-H, except for the 16-way Unisys system, for
which there is no comparable Opteron system.


Opteron on the other hand, avoided comparison with Core2 architecture
in 2-way systems and TPC-C/E OLTP benchmarks across the board.
In the 2-way systems, the Intel old-FSB technology was still adequate,
and the powerful Core2 architecture core was enough to beat a 2-way Opteron.
There were respectable 4-way TPC-C and TPC-E results for Shanghai.
When AMD announced the HT-Assist feature in Istanbul,
one might have thought AMD was finally going to be able compete in 4-way OLTP.
But there have been zero benchmarks published as of current.

When the 2-way Intel Xeon 5500 processor, based on the Nehalem architecture,
came out in early 2009, outstanding results were published for both
the OLTP oreiented TPC-E and DW/DSS oriented TPC-H.
In February 2010, a TPC-C was published as well,
even though Microsoft had previously said all new OLTP benchmarks were going to be TPC-E.
This result was with SQL Server 2005 instead of 2008.


There was every expectation with the Xeon 7500 Nehalem-EX,
that there would be both OLTP and DW/DSS benchmark results,
as Xeon 7500 should produce world-class (and world-record) results in both.
It is possible that performance problems were encountered in trying to
achieve good scaling over 32-cores and 64-threads in a 4-way Xeon 7500 system.
If this is identified as something that can be fixed in the Windows operating system
or SQL Server engine, then a change request would be made.
I seriously doubt that another processor stepping would be done for this,
as Xeon 7500 is already D-step at release.

TPC-H Scaling


It is also quite possible Intel will have to face the fact that
2.5X the 2-way Xeon 5500 TPC-H SF100 result of 51,000 QphH is not going to be achieved
no matter how good Xeon 7500 is at DW.
This is because the TPC-H scores is a geometric mean of the 22 queries.
There are several small queries in TPC-H, two of which already run in under 1 seconds
on the 2-way 8-core Xeon 5570 for SF100, and several that run near or under 2 seconds.
There is limited opportunity to continue to improve the performance of small queries
with increasing degree of parallelism, as the overhead to setup each thread
becomes larger compared to the actual work done be each thread,
especially if one also has the give up frequency, dropping from 2.93 to 2.26GHz.
It would be helpful to know what the actual frequency is during a performance run
with the turbo-boost feature.


It is possible that some marketing putz does not understand this and
denied permission to publish perfectly good Xeon 7500 TPC-H results
because it did not meet the 2.5X goal.
(Along with making a negative ranking and review entry
for the person responsible for TPC-H benchmarking due to failing to achieve the 2.5X goal.
But lets not grind axes on here. Besides, who said life was fair?
It takes exceptional talent to accomplish the impossible.
A clever person anticipates impossible problems, and transfers to another group
to avoid a sticky wicket).


Achieving 2.5X in the big queries is a more meaningful goal.
Achieving 50% better than the 8-way Opteron 6-core TPC-H SF300 or SF1TB would also
be a worthwhile accomplishment, if Xeon 7500 were upto the task.

TPC-E Scaling


Finally, a quick comment on Xeon 7500 scaling from 4-way (32-cores, 64-threads)
to 8=way (64-cores, 128-threads).
In the past, achieving 1.5 scaling with this number of cores would have been a triumph.
Given the announcement Microsoft made on Windows Server 2008 R2,
on removing the thread scheduler and other impediments to high-end scaling,
we were expecting 1.7X scaling.
It could be that scaling beyond 64-threads in tricky,
because of the 64-thread limit per group(insert correct terminology).
Hopefully the 4-way to 8-way to 16-way scaling will improve over time as problems
are solved one at a time, while the task master whips his/her draft horses
(again, I digress).

Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX) SKUs

Lets take a look at the Xeon 5600, 7500 and 6500 SKUs.
The low-voltage, low power SKUs are omitted.
These are fine products for high-density environments, web servers, and utility database.
The Line-of-business and DW databases should be on the X models.

Xeon 5600 SKUs

Model Cores Threads GHz Turbo L3 QPI GT/s Memory Power Price*
X5680 6 12 3.33 3.6 12M 6.4 1333 130 $1,663
X5670 6 12 2.93 3.33 12M 6.4 1333 95 $1,440
X5660 6 12 2.80 3. 2 12M 6.4 1333 95 $1,219
X5650 6 12 2.66 3.06 12M 6.4 1333 95 &nbsp$996
E5640 4 8 2.66 2.93 12M 5.86 1066 80 &nbsp$774
E5630 4 8 2.53 2.8 12M 5.86 1066 80 &nbsp$551
E5620 4 8 2.40 2.66 12M 5.86 1066 80 &nbsp$387
X5677 4 8 3.46 3.73 12M 6.4 1333 130 $1,693
X5667 4 8 3.06 3.46 12M 6.4 1333 95 $1,440

* Intel 1k pricing

Xeon 7500 SKUs

Model Cores Threads GHz Turbo L3 QPI GT/s Memory Power Price*
X7560 8 16 2. 26 2.66 24M 6.4 1066? 130 $3,692
X7550 8 16 2.00 2.4 18M 6.4 ? 130 $2,729
E7540 6 12 2.00 2.26 18M 6.4 ? 105 $1,980
E7530 6 12 1.86 2.13 18M 5.86 ? 105 $1,391
E7520 4 8 1.86 1.86 18M 4.8 ? 95 &nbsp$856
X7542 6 6 2.66 2.8 18M 5.86 ? 130 $1,980

Xeon 6500 SKUs

Model Cores Threads GHz Turbo L3 QPI GT/s Memory Power Price*
X6550 8 16 2. 00 2.4 18M 6.4 ? 130 $2,461
E6540 6 12 2.00 2.26 18M 6.4 ? 105 $1,712
E6510 4 8 1.73 1.73 12M 4.8 ? 105 &nbsp$744

Before commenting, recall the main differences
between the Xeon 5600 and Xeon 7500/6500 series.
The Xeon 5600 series (32nm process) has 2 QPI links and 3 memory channels.
The Xeon 7500 series (45nm process) has 4 QPI links, 4 memory channel,
larger cache per core (for the 24M version, 3M vs 2M)
plus extensive reliability features.
The 2 QPI links on the 5600 series allows a 2-way (socket) system.
The 4 QPI links on the 7500 series allows glueless 4-way and 8-way.
My understanding is the 6500 series is the 7500 with only 2 QPI links enable
for 2-way systems with 16-cores and 8 memory channels total, at lower frequency
than the 5600 with 12-cores and 6 memory channels total,
plus the 7500 RAS features.

Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX) Systems

Now lets looks at system pricing for the 2-way Dell PowerEdge T710 (Xeon 5600),
R810 (either 7500 or 6500) and the 4-way R910 (7500).
All systems with redundant power supplies, 2x73GB 15K 2.5in drives, 6Gb/s SAS.
4 power supplies in the 4-way

Dell PowerEdge T710 Systems with 2 Xeon 5600 processors

System Processor GHz Cores L3 QPI Memory Price
T710 X5680 3.33 6 12M 6.4 1333 72GB 18x4G $9,974
T710 X5660 2.80 6 12M 6.4 1333 72GB 18x4G $8,634
T710 X5650 2.66 6 12M 6. 4 1333 72GB 18x4G $8,154
T710 E5640 2.66 4 12M 5.86 1066 72GB 18x4G $7,474
T710 E5630 2.53 4 12M 5.86 1066 72GB 18x4G $6,934

For some reason, Dell does not offer the T710 with the second from top X5670 2.93GHz.

Dell PowerEdge R810 Systems with 2 Xeon 7500 or 6500 processors

System Processor GHz Cores L3 QPI Memory Price
R810 X7560 2.26 8 24M 6.4 1066 64GB 16x4G $17,866
R810 X7542 2.66 6 12M 5. 86 ? 64GB 16x4G $13,366
R810 X6550 2.00 8 18M 6.4 1066 64GB 16x4G $13,066
R810 E7540 2.00 6 18M 6.4 1066 64GB 16x4G $12,166
R810 E6540 2.00 6 18M 6.4 1066 64GB 16x4G $11,496

Dell PowerEdge R910 Systems with 2 out of 4 sockets populated, Xeon 7500

System Processor GHz Cores L3 QPI Memory Price
R910 X7560 2.26 8 24M 6.4 1066 64GB 16x4G $19,246
R910 X7550 2. 00 8 18M 6.4 1066 64GB 16x4G $16,446
R910 E7540 2.00 6 18M 6.4 1066 64GB 16x4G $13,546
R910 E7530 1.86 6 18M 5.86 980 64GB 16x4G $12,446

Dell PowerEdge R910 Systems with 4 Xeon 7500 processors

System Processor GHz Cores L3 QPI Memory Price
R910 X7560 2.26 8 24M 6.4 1066 128GB 32x4G $34,040
R910 X7550 2.00 8 18M 6.4 1066 128GB 32x4G $28,440
R910 E7540 2. 00 6 18M 6.4 1066 128GB 32x4G $22,640
R910 E7530 1.86 6 18M 5.86 980 128GB 32x4G $20,440

Previously, I had argued that processors and systems today were so powerful
that the standard practice of buying 4-way systems for critical database server
by default be changed to 2-way.
What I mean by default is in lieu of proper system sizing analysis.


It may seem strange that I suggest not doing a proper sizing analysis
(one of my services as a consultant).
But from the sizing analysis I have seen done by other people,
the quality of the work was poor and
the effort cost more than a pair 4-way systems.


What this means is that the practical solution used to be to buy a 4-way system.
Try it out.
If it not sufficient,
then hire someone (there are many people who can do this) to make it work on a 4-way.
If that does not work, consider pruning features until it does work.


So why not just move up to an 8-way or larger system?
Because 8-way and larger are mostly NUMA systems.
Technically, all Opteron 2-way and up are NUMA.
But by NUMA, I really mean systems where there is a
large discrepancy between local and remote node memory access.
There are very very few people who can do performance analysis on a NUMA system
(not those who claim to be able to).
Do a search on SQL NUMA to see who has published meaningful material on this matter.

Default System Choice: Intel Xeon 5600


Anyways, the default choice today should be a 2-way system.
However, since this is critical system,
perhaps there are features from the high-end that we want.
I believe this is the rational for the Xeon 6500 from Intel,
and the PowerEdge R810 from Dell.


In looking over the T710, R810 and R910,
I am inclined to say the effort was not entirely successful,
as with many first iterations.
The effort definitely deserves merit,
and is the proper direction for the future.
But it just needs further refinement.
Of course, the true measure whether people actually buy the R810 in volume,
not just one persons opinion.


The R810 with either X7560 or X6550 just gives up too much frequency
for the extra 2 cores per socket, and fourth memory channel.
Some environments might want the X7500/6500 RAS features despite this.
And there is only a $1400 price difference between the R810 and R910
with 2 sockets populated.


The amount of $1,400 is very small for having two extra sockets available,
even though most people never populate sockets after system purchase.
It would be nice if could buy the R910 with 4-sockets populated,
but not have to pay the per-socket software licensing until they are turned-on,
like in RISC world.


True, the R810 is a 2U form factor compared with 4U for the R910,
allowing much higher density.
But the assumption was this is a critical database server,
for which an extra 2U is not a show stopper.
(There are people who get hung up on the latest industry jargon/fads,
and forget the job one is making sure your business in running.)

Late Addition: AMD Opteron 6100, Magny-Cours

AMD Opteron 6176 (Magny-Cours) 2-way 12-core results have been just published, with the HP ProLiant DL385G7.
I will add more detail later. The 2-way TPC-E result is 887.38 and the TPC-C result is 705,652. Interestingly, both the HP ProLiant DL370G6 with the Xeon W5580 and the DL385G7 Opteron TPC-C results are on SQL Server 2005. Perhaps the Microsoft mandate to use TPC-E is for SQL Server 2008, hence the C on 2005 was allowed? Also of interest is that the Opteron 6176 TPC-C result uses 125 SSDs instead of hard disks (1300 HDs in the Xeon W5580 result).


Before comparing the Opteron 12-core with Xeon 5500, let us first compare against the previous generation Xeon 5400 quad-core.
The 2-way 12-core Opteron 6176 achieved OLTP results higher than the Xeon 5460 by 2.5X on TPC-C and 2.8X on TPC-E. These are very good results for a 3X increase in the number of cores. Now in comparing against the quad-core Xeon 5500 series, the 12-core Opteron is just marginally higher. I am inclined to think much of this is due to the Hyper-Threading capability in the Xeon 5500 series. HT was much maligned in the NetBurst architecture generation. Some people today still blindly regurgitate the advice to disable HT, not realizing this advice was applicable to the old NetBurst and not the new Nehalem architecture processors. At some point AMD may have to admit that implementing HT will be a necessity.

The price for the DL385G7 with 2×6176 processors from the TPC-H report is $1,511 for the system chassis,
$1,799 for each processor, $990 for each 8GB kit, and perhaps another $1K for comparable configuration as above.
This is very reasonable, except for the memory which seems high. Each 8GB kit should be around $500.

Magny-Cours is comprised of two six-core Istanbul die(?) each with 4×0.5 L2 cache and 6M L3.
The Istanbul die size is 346mm2, versus 540 mm2 for Nehalem-EX with 8-cores and 24M L3.
The images below were adjusted to match the die size closely,
but there is no assurance that the aspect ratios are correct.


For some reason I thought Nehalem EX was 540 mm2 when in fact the Intel website says it is 684 mm2.
The figure below shows the corrected scaling.


2010 June 21, HP ProLiant servers, and other results

HP has just announced the ProLiant DL580 G7 and DL980 G7 servers based on the Xeon 7500 series processors,
and the DL585 G7 4-way server with the 12-core AMD Opteron 6100 series (Magny-Cours).


Apparently the reason for the delay is that the 8-way DL980 G7 employs custom silicon node controllers (XNC),
and possibly, so HP could make a splash in announcing all three system at their big annual conference:
HP Technology Forum. The DL580 and 585 G7 are available now(?),
and the 980 G7 should be available later in Q3.


While the Intel Xeon 7500 processor allows a glue-less 8-way system,
HP felt that the design could be improved with node controllers.
The node controllers reduce snoop traffic for a majority of memory accesses,
and can achieve a 30% reduction in memory latency in some circumstances.
It should be considered that HP needed to build custom silicon crossbar (& node controllers)
for their SuperDome2 system and the Itanium 9300 series processors,
which use the same QuickPath Interconnect (QPI) as the Nehalem processors.
There are differences in the way the Itanium and Xeon processors use QPI.
There are also differences between the node controllers for the Itanium and Xeon systems.
(The Itanium node controller implements directory based cache coherency
and the Xeon node controller is snoop filter).


HP may have built a glueless 8-way Xeon 7500 system if they had not already invested the effort
to built the XNC for their Itanium systems.
This also means that HP should have the components to built a 16-way Xeon 7500 system,
meaning that if there were market demand, such a system could be brought to market.
Intel did say that there were 16-way Xeon 7500 system designs, but none have surfaced yet.


Dell has also released a 2-way TPC-E result for the Xeon 5600,
and Fujitsu released a 4-way TPC-E result for the Xeon 7500

TPC-H 300GB: 4-way DL585 G7 vs 8-way ProLiant DL785 G6

A comparison of the TPC-H 300GB results for the 8-way ProLiant DL785 G6 and the 4-way DL585 G7 is interesting,
with the 4-way DL585G7 having 18% better performance on the Power metric.

System TPC-H Power TPC-H Throughput TPC-H Composite QphH
DL785G6 109,067.1 76,860.0 91,558.2
DL585G7 129,198. 3 89,547.7 107,561.2

The significant differences between the two systems are below.
Both system have the same number of total cores, the 8-way with 6-core processors
and the 4-way with 12-core processors.
The DL785G6 cores are 2.8GHz versus the DL585G7 at 2.3GHz, about a 20% difference.
The DL585G7 has twice the memory, 512GB versus the 256GB.
For TPC-H at SF300, and using SQL Server 2008 page compression,
256GB is not quite sufficient to encompass the entire database tables and indexes.
With 512GB, there is more than sufficient memory for data,
indexes and probably most hash join intermediate results (for minimal tempdb activity)

System DL785G6 DL585G7
Processor Opteron 8439 Opteron 6167
Sockets-Cores 8 x 6 = 48 4 x 12 = 48
Frequency 2. 8GHz 2.3GHz
Memory 256GB 512GB
Storage 194 HDD 4 SSD
Windows Server 2008 EE SP1 2008 R2 EE
SQL Server 2008 EE SP1 2008 R2 EE

That the DL585G7 employs SSD storage is not expected to impact performance,
and was probably used for lower cost.
The 194 15K HDDs and 12 storage enclosures in the DL785 cost $110K,
while the 4 320GB Fusio-IO drives in the DL585 cost $55K.
If the DL585 had 256 or less memory,
then the SSD storage would have moderately better performance than with HDD storage.
Another significant difference are the improvements in Windows Server 2008 R2,
several of which have major impact scaling to a high number of processor cores.

The chart below shows the TPC-H power query run times for the DL585G7 relative to the DL785G6.



TPC-H Power query run times, DL585G7 relative to DL785G6


Overall, the DL585G7 with 4 Opteron 6167 is about 20% higher than the DL785G6 with 8 Opteron 8439 processors.
For the individual queries, several are moderately faster, 3 are much faster,
5 are about the same, and 3 are actually significantly slower.
The DL785 has faster processors, which should make all queries run faster.
It is difficult to account for differences in the system architecture,
as there may be difference in how the individual dies are connected.
The greater memory on the DL585 is expected to make certain queries run faster.
The scaling improvements in R2 (OS and SQL) might contribute significant gains in some queries,
but may also negative effects in others.


It would be very helpful to have access to the actual execution plans,
along with execution statistics to determine if the differences
can be attributed plans differences or differences in disk IO.

TPC-H 3000GB: 8-way Xeon 7560 vs 16-way Xeon 7460

Below are the TPC-H 3000GB results for the 8-way ProLiant DL980 G7 with the Xeon 7560 processor
and the 16-way ES7000 with the Xeon 7460.
The 32-way dual-core IBM 5GHz Power6 result is also shown.

System TPC-H Power TPC-H Throughput TPC-H Composite QphH
16 x Xeon 7460 120,254.8 87,841.4 102,778.2
8 x Xeon 7560 185,297.7 142,685.6 162,601.7
32 x Power6 142,790.7 171,607.4 156,537.3

Additional details are below:

System ES7000 DL980G7 Power 595
Processor Xeon 7460 Xeon 7560 Power6
Sockets-Cores 16 x 6 = 96 8 x 8 = 64 32 x 2 = 64
Hyper-Threading no yes 4/core?
Frequency 2. 66GHz 2.26GHz 5.0GHz
Memory 1024GB 512GB 512GB
Storage 914 HDD 660 HDD 288 HDD
OS 2008 R2 DC 2008 R2 EE AIX 6.1
Database 2008 R2 DC 2008 R2 EE Sybase 15.1

The Unisys system may have been over-configured in disks and memory.
Many of the TPC-H queries involve large table (or range) scans).
If the entire entire database cannot be brought into memory,
then there may not be much difference in the disk IO generated with either 512G or 1TB memory.
More importantly, the Windows operating system and SQL Server versions match,
so there is high confidence we are seeing mostly the difference between the two processor (and system) architectures.


The IBM system may appear to be under-configured in terms of the number of disk drives.
But it does seem that other database engine are better in switching from pseudo-random to sequential scan operations,
and can work fine with fewer disks.


While the Xeon 7400 series processor core was top of the line in its time,
even the 4-way Xeon 7400 system had limited memory bandwidth (and channels).
Scaling beyond 4-way was not a simple matter.
Of course, the Xeon 7400 systems were still competitive with systems based on processors with better scalability,
but weaker single core performance.

Based on the 16-way Xeon 7460 result,
the expectation is that an 8-way Xeon 7460 would be in the range of 75,000, i.e.,
doubling the number of processors should increase performance by 1.6X.
In turn, there is sufficient reason to estimate that the Xeon 7560 is about 2.5X more powerful
than the Xeon 7460 for data warehouse usage.
This is less than the 2.77X observed in OLTP,
which is inline with expectations because OLTP derives substantial benefits from Hyper-Threading (30%?)
and data warehousing derives only a modest benefit from HT (10%?).


The chart below shows the TPC-H power query run times for the 8-way Xeon 7560 relative to the 16-way Xeon 7460.



TPC-H Power query run times, 8-way Xeon 7560 relative to 16-way 7460


As with the earlier comparison, there is also wide variation in the individual queries.
Many queries are 40% faster, two are about the same, two are actually slower, and one is more than 5X faster.

TPC-H 1000GB: 8-way 6-core Opteron 785G6 vs 16-way quad-core Itanium

Below are the TPC-H 1000GB results for the 8-way ProLiant DL785 G6 with the Opteron 8439 processor
and the 16-way Integrity Superdome 2 with the Itanium 9350.

System TPC-H Power TPC-H Throughput TPC-H Composite QphH
8 x Opteron 8439 95,789.1 69,367.6 81,514.8
16 x Itanium 9350 139,181. 0 141,188.3 140,181.1

Additional details are below:

System DL785 G6 Superdome 2
Processor Opteron 8439 Itanium 2 9350
Sockets-Cores 8 x 6 = 48 16 x 4 = 64
Hyper-Threading no yes
Frequency 2.8GHz 1.73GHz
Memory 512GB 512GB
Storage 240 HDD 576 HDD
OS 2008 R2 EE HP-UX
Database 2008 EE Oracle 11g R2

The operating system and database engine are both completely different,
so caution is warranted in comparing the results.
Also very important is that the execution plans could also be very different in certain queries.

As the expectation is that doubling the number of processors should lead to approximately 1.6X
performance gain, we can see that six-core Opteron 8439 is the same neigbhorhood as the
quad-core Itanium 2 9350.
The individual Opteron processor is probably a little better than the Itanium at the socket level
in the TPC-H Power test, but the Itanium has the advantage in through-put oriented usage.


The chart below shows the TPC-H power query run times for the 16-way Itanium relative to the 8-way Opteron.



TPC-H Power query run times, 16-way quad-core Itanium relative to 8-way 6-core Opteron

As expected, there is wide variation in the individual queries.
The are differences in almost every important area: the processor and system architecture,
the operating system and the database engine.
It is not just the difference in the database engine, but also the execution plans.

Intel Xeon processors in LGA1366 design

In the last article we got acquainted with the performance of all Core i7 processors with LGA1366 socket. It was not difficult to do this, but rather boring — except for the overclocking capabilities in extreme versions of processors, they are all very similar to each other. In fact, in normal mode, all these devices differ only in the «base» clock frequency, and the speed of the QPI link. In a word, to a small extent, so, knowing the results of two processors of the family, by simple extrapolation, you can get information about the performance of any other of its representatives. We tried to introduce a little intrigue, relying on the fact that de jure, rejecting the possibility of using DDR3 memory with a frequency above 1066 MHz together with Core i7, Intel de facto began to «endorse» it in the latest BIOS versions of its motherboards, but it did not work out . A more or less noticeable increase in real applications, as it turned out, can only be obtained by “not touching” the relative timings when increasing the frequency, but in all other cases, in some programs, even a decrease in performance is observed. Everything is simple and boring.

The Xeon 3500 series processor family looks no less simple and boring at the moment, but the 5500 line is much more versatile. Firstly, it includes crystals with a wide variety of TDP values. Secondly, they differ in official support for different memory options, as well as a much wider range of QPI speed changes (however, as we have already seen, neither the first nor the second has a significant impact on the final performance on our test mixture). Thirdly, Nehalem’s signature feature is the ability to not only reduce, but also increase the clock frequency (which was not available in earlier processors), in some Xeon models it is more developed than in other Xeon and all Core i7 LGA1366. In contrast, some Xeon modifications do not support not only Turbo-Boost, but also Hyper-Threading at all. But even more interesting is that within the 5500 family, processors were registered not only with 8 MB L3 cache memory, but also with a halved one. And even one dual-core in it is. Unfortunately, however, not all motherboards have support for 550x models — ours basically “does not start” with them. So the question of their performance, as well as finding out the degree of usefulness of two-socket solutions, we will leave for the next time. Today we have four “full-fledged” Xeons and several related issues on the agenda. We will start with them.

Turbo-Boost and Xeon 5500

When we tested the Xeon X5560, there was not even reliable information about this processor, and the BIOS of the motherboard was much more limited in capabilities than it is now, so the method of determining «turbo throttling» parameters by using processors Extreme Edition was not mastered. In the end, we got better results than expected based on the behavior of the Core i7, and we came to the conclusion that they are associated with the use of higher frequency memory. However, when testing the processors of the Core i7 line last time, we found out that the transition to DDR3-1333 does not give much even to older processor models. To tell the truth, it does nothing — in some tests the results are lower than they could be. What is the reason for the higher performance of the Xeon of the X family, which we obtained earlier, than that of the Core i7, at the same nominal frequency?

And the chest is easy to open. The secret is once again in the intricacies of the functioning of the «boost mode». The scheme by which the Core i7 and Xeon 3500 work can formally be called +2-1-1-1, that is, the frequency of each core is increased by two or one power of the multiplier, depending on the nature of the load: one core is loaded, two, three or all. “Formally”, because, in fact, the increase in frequency is not relative: just for each of these modes, their own multipliers are set — just like the starting one is set. As a result, by the way, when Turbo Boost is enabled, the old “old-fashioned” way of getting a younger processor from an older one by lowering the multiplier does not work: we will lower the starting one, but in the “boost mode” the processor will overclock to the same frequencies (unless, of course, consider extreme models where you can customize anything). But Xeon X55x0 «overclock» more aggressively — for them, the corresponding formal scheme of work will already be + 3-3-2-2. What follows from this? With any number of threads, the Xeon 5500 can run at a higher final clock speed than a Core i7 of the same nominal frequency. At a minimum, “a step up”, and in two-threaded applications, all two. That is, from the point of view of the nominal table frequency, the Xeon X5570 is an analogue of the Core i7 940. But if we compare the frequency of the «boost mode» — this is already a processor close to the Core i7 950.

And what about the models of the E and L lines? And it’s even more interesting there. Some representatives of these subfamilies do not support Turbo-Boost, while the rest operate according to the + 2-2-1-1 scheme. From this it follows that the E5540 may well sometimes catch up with the Core i7 920, having a lower frequency by 133 MHz. Sometimes — only in two-threaded applications. On the other hand, among the “typical desktop” ones that are actively used in our methodology, there are a lot of them. And let’s see how it will affect in practice.

Even more complicating the comparison of frequencies is the fact that the Core i7 and Xeon 5500 not only can increase the frequency in different ways — they also reduce it in different ways, since they fit into different thermal packages. We already gave an example in the previous article: when you try to «warm up» the processor above 80 W, the Core i7 will work in «boost mode», and the Xeon X5500 will turn it off. With a further increase in load, when the power consumption reaches 95 W, the Xeon will already start throttling, and the Core i7 will continue to increase the frequency. That is, now the management of power supply and operating frequency has already reached the level where it is not only impossible to compare processors of different architectures by clock frequency “head-on”, even identical cores begin to behave completely differently. Rated 2.93 GHz Xeon is not at all the same as 2.93 GHz Core i7 in practice.

In passing, by the way, we can answer the question — why did Intel need Core i7 950 and 975 Extreme Edition instead of 940 and 965 EE after the transition to the new stepping. For the first revision of Bloomfield, Intel could guarantee core performance at a frequency of 3.47 GHz with a thermal package of 110 W — this is just 965 EE or Xeon W3570 in «boost mode». The new stepping, as is often the case, improved the ability to achieve higher clock speeds. Not much, but the company can guarantee an additional 133 MHz with the same thermal package. Anyway, the Xeon W5580 reaches 3.6 GHz when using Turbo Boost. And the X5570 is even more interesting in this regard — 3.33 GHz, keeping within 80 watts. When transferring to the new Core i7 stepping under LGA1366, this “wealth” could be disposed of in two ways: either introduce the same Turbo Boost operation scheme for the updated processors, which is typical for Xeon, or stay within the old one, but increase the nominal by the same 133 MHz clock frequency. The first leads to confusion, the second does not, which is preferable. The price of the new «accelerated» models obviously remains the same as before.

And now you can look ahead a little — in the days of LGA1156. According to the information currently available, the “debut” senior processor of this family will have a frequency of 2.93 GHz, in “boost mode” it will begin to raise it to the same 3.6 GHz (maximum) as the top processors for LGA1366, however, content with a thermal package similar to the Xeon 5500 X-family. At the same time, this processor will have the same price as the current Core i7 950, with a nominal frequency of 3.06 GHz, but overclocked to only 3.33 GHz. However, it is capable of operating in a “harder” thermal regime and equipped with a three-channel memory controller, and not a two-channel one, which should not be discounted either. In the near future, equipment testers (in particular, us) are in for very fun and interesting times. And the «tabular» official clock frequency finally runs the risk of becoming a complete abstraction even on the mass market. 2.26/2.53 2.53/2.8 2.8/3.2 2.93/3.33 No. of cores 6 4 4 L1 cache, I/D, KB 32/32 32/32 32/32 32/32 CASH

, KB 9 256 4 x 256 4 x 256 4 x 256 L3 cache, KB 8192 8192 8192 8192 RAM (**) 3 x DDR3-1066

3 x DDR3-1066 3 x DDR3-1333 3 x DDR3-133333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333APHED Multiplication 17 19 21 22 qpi 5. 86 GT/s 5.86 GT/s 6.4 gt/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 GT/s 6.4 gt/s 6.4 6.4 Socket LGA1366 LGA1366
(**) maximum frequency officially supported by the memory controller in the processor

Despite the fact that both the Xeon 5500 and Core i7 are currently produced in the same unified LGA1366 construct, in practice it is impossible to talk about their full compatibility. Firstly, most desktop (and not only) motherboards in 5500 support are limited both from above and below — the top W5580 usually does not work and, conversely, the most “cut down” E550x and the L5510 that joined them with half the amount of cache memory and disabled Turbo-Boost and Hyper-Threading technologies. But we managed to assemble a relatively «representative» set of processors, which allows us to make testing more or less interesting. Firstly, these are the older representatives of the X-family, the most interesting on the “mass server” market: it usually makes no sense to use one W5580, and two make too serious demands on the computer cooling system, releasing up to 260 W of heat “into the air” for two . Secondly, the E5540 is the closest, in terms of nominal clock speed, to our «reference» representative of the previous family — Core 2 Quad 9300, the results of which are taken as a «scale unit» in the framework of the current methodology for testing processors. In addition, this particular model is the fastest of the processors with a TDP of 80 W, which can sometimes be quite important. Similarly, the L5520 is the fastest of the low-power (TDP 60 W) Xeon 5500 subfamily. And its half-brother E5520 is the youngest and cheapest of the «full-fledged» Xeon 5500. In principle, the E5520 may turn out to be a more productive device than the L5520 (not forget about higher throttling thresholds), and less (if the crystals are selected in a special way, which the company can afford), but on average, these factors should roughly balance each other. 4 4 4 CASH L1, I/D, KB 32/32 32/32 32/32 L2 CASH, KB

4

4 x 256 4 x 256 L3 cache, KB 8192 8192

8192 3 x DDR3-1066

900. 3 x DDR3-1066 Multiplier 20 22 23 qpi 4.8 GT/s 4.8 GT/s 4.8 gt/s LGAA1366

LGAA

LGAAA

LGAA

LGA1366 TDP 130 W 130 W 130 W N/D (2) N/D (1) ()

(*) when the «auto-overclocking» Turbo Boost function is enabled (which is implied by default), the actual frequency of individual cores increases relative to the nominal by 133-266 MHz, depending on the load
(**) the maximum frequency officially supported by the memory controller in the processor

For comparison, we took three representatives of the Core i7 line tested last time, and the younger representatives. Extreme editions are still not suitable for comparison — they are obviously faster than all Xeons, with the exception of the W5580. But the 940 and 950 are just right for us: the first processor has the same nominal frequency as the Xeon X5570, and the second has the same boost-mode frequency as it does. And let’s see how the roles are distributed according to real performance. And Core i7 920 as an obvious «reference point» and asks for diagrams. By the way, as you know, the remaining life of this processor is not so long — only until the announcement of LGA1156, after which the cheapest offer for the «desktop» LGA1366 will tightly occupy the position of $562. But among the Xeon 5500 there are also cheaper ones, and not only among the final «trimmings» of the 550x series, but also quite a full-fledged E5520. Now it’s not very interesting for single-socket configurations, since the 920 costs $ 100 less, but after the disappearance of the latter … For some, this model may well become attractive, so the question is how much the younger “uncut” Xeon is slower than the Core i7 920 in normal mode runs the risk of ceasing to be purely theoretical. Especially if you take into account the significantly lower heat dissipation of this series.

Intel DX58SO (X58)

KINGSTON KVR133333N9K3/6G (1066)

System board RAM
Core i7, Xeon E/L
Xeon X Intel DX58SO (X58) Kingston KVR1333D3N9K3/6G (1333, 9-9-9-24)

Testing

Performance testing methodology (list of software used and testing conditions) is described in detail in the article. For ease of perception, the results on the diagrams are presented as a percentage (the result of Intel Core 2 Quad Q9300 is taken as 100% in each of the tests). Detailed results in absolute terms are available as a spreadsheet in Microsoft Excel format.

3D visualization

As expected, X5570 is faster than i7 940 but slower than i7 950. The opposite would be strange — sometimes the real frequency of Xeon is closer to the second processor. But the small difference between X5560 and X5570 is curious. Although there is an explanation for this — the younger model, for obvious reasons, works in a more comfortable energy mode, so it can use Turbo-Boost more aggressively. We have often seen that as the frequency increases, the «efficiency per megahertz» of the same core decreases, and this was observed even before the introduction of dynamic frequency control, and now it simply has to manifest itself even more strongly. Still, the X5570 is the older model with a thermal package of 95 W, and the limitation may already begin to affect it.

At the other end of the table are representatives of the «junior» lines. Still, the E5540 failed to catch up with the i7 920, although it got closer to it than could be judged by the difference in starting frequencies. And why is the difference between L5520 and E5540 so small? Apparently, the processors of the L-family are not at all «ordinary» with simply strictly lowered power consumption thresholds. These are truly the «best beans». Therefore, they manage to keep within 60 W with less losses than the E55x0 to get into their range “up to 80”.

But this group of tests is not the best thing you can slip Nehalem into. The load is mainly two-threaded (although it is quite serious, so it is quite easy to “warm up” a couple of cores with work), as we have already seen last time, fast work with memory does not give special dividends, etc. Only the active use of Turbo-Boost saves, which allows us to slightly break away from the Core 2 of the same and even lower nominal clock frequency (recall that we have a hundred scale units here and everywhere — this is the Core 2 Quad Q9300). Let’s see what will change with a different nature of the load.

Rendering 3D scenes

The situation is aggravated — when all Xeon X-series cores are fully loaded, it can have an advantage of 133 MHz over the «equal frequency» Core i7, but «may» does not mean «has»: you have to take into account power consumption. As a result, the «leading» group was ranked according to the nominal clock frequency, and in a very accurate way: 6 «parrots» per step. In the «laggards» everything is much less clear-cut than theory could predict. Again, only the performance of the L5520 is pleasing to the eye — this is an excellent processor for low-power servers. Even for single-socket, where it is able to compete on equal terms and moreover with the L3360 (with a frequency of 2.83 GHz), not to mention dual-socket, where the oldest of the low-power Xeons on the Core 2 architecture had a clock frequency of just 2.5 GHz, as and our Q9300, the difference with which is visible to the naked eye.

Scientific and engineering calculations

These applications are more conservative in terms of multithreading support, but the threads here are “simpler” than in the first group of tests. The main result can be considered that the results are extremely close to the theory. That is, it is not so dry that sometimes the classics of literature are not spoken there — sometimes it coincides exactly with practice.

Raster graphics

Curiously, a picture is very similar to rendering. That is, another confirmation of the fact that “to be able” does not mean “to do”. More precisely, not always — everything is true for the X-line, and the E5540 almost caught up with the Core i7 920, and only the presence of some subtests that could load less and more than two cores with work prevented him.

Data compression

The results are somewhat ambiguous, but easily explained. Firstly (which is no longer a secret to anyone), while for archivers “multi-threading” means “two-threading”. Secondly, the load, as such, is not very large — these programs are not good «warmers». Thirdly, on the other hand, they are one of the best examples of programs that are critical to the speed of the memory subsystem. All these factors lead to the fact that the Xeon X-series overtake not only their “equal-frequency” colleagues, but they are also able to win back the difference in one step of the multiplier. But other subfamilies have only one dope, and even then in smaller quantities, hence the corresponding result.

Compilation (VC++)

When comparing processors with similar technical characteristics and the same architecture (moreover, they all have high performance), it begins to show that they all cope with the compilation of the test task very quickly, so that one or two seconds — that’s another meaning. Even averaging the results over several runs does not always save: all the same, in the end, plus or minus one point can be drawn. New processors are too fast, for which they (together with the developer) are hardly worth scolding — programmers will most likely adhere to a diametrically opposite opinion. Approximately, such a result (if we ignore the meager fluctuations) could be expected initially.

Interpretation (Java)

It is difficult to say whether there are users for whom the speed of the Java machine is as critical (of course, on modern desktop computers, not on a mobile device) as the speed of compiling code for programmers, but the behavior of processors in this test, it has not changed much and continues to please the eye of a speed lover. Especially because the latter within the same family, as we see, can be increased not only by a «stupid» increase in the starting clock frequency, but also by its dynamic control, remaining in harmony with power consumption and not only.

Audio encoding

In terms of practical use, the situation is similar here — the days when you had to leave your computer overnight to encode several albums into MP3 are long gone: now everything is, basically, limited by the speed of receiving the source material (over the network whether, from disks). But the behavior of these programs is somewhat different. As we can see, not only are audio coding tasks still very processor-dependent, they also put a lot of load on the processor. The subtleties of a more or less aggressive increase in the clock frequency do not play the piano here: everything is exactly in the order of the starting clock frequency. But, of course, always very quickly.

Video encoding

And everything is the same with video. True, one cannot argue here about the practical usefulness of high results, since, as before, the procedure for obtaining the finished material from the source remains quite lengthy.

Gaming 3D

The situation with games is logical, although somewhat unpleasant for supporters of purchasing powerful processors for a gaming computer — once again we see that the result depends on the CPU, but to a rather small extent. And after that, you finally understand that NVIDIA’s recently sensational recommendations are not pure PR. Indeed, it’s time for players to think about multi-GPUs, and not about high-end processors. Especially when it comes to modern cores of the latter — as you can see, even the younger Nehalem in the «uncut» version still outperform the middle peasants of the previous family, so … more is not required. Now video and only video.

Total

What we like about averaging results across all tests is that it smooths out all the behavior of single applications, leading (with a large number of tests, of course) to a quite reasonable «average temperature in the hospital». Otherwise, it would be too difficult to compare processors that have too different attitudes to changing their frequency. Sometimes Xeon raises it more aggressively, sometimes it is forced to «limit abilities» due to a narrower thermal package — in the end, everything comes full circle. It can be seen that the X5570 is still closer in terms of final performance to the Core i7 940 than to 950, although faster than it.

One more time to be happy for the engineers of Intel, who provided the latter with such a flexible way to vary the performance and power consumption of processors. And once again be upset that such a simple and unambiguous a priori method of performance assessment as a comparison of the clock frequency has finally become a thing of the past. Even within the framework of processors of the same architecture, although (as we once again see), when comparing crystals of different families, the situation is even more aggravated — the “efficiency per real megahertz” for Core 2 and Nehalem cores does not differ so much, however, all architectural improvements are capable of easily provide up to 25-30% performance at the same declared frequency.

We would like to thank the Russian representative office of Kingston Technology for their assistance in completing test benches

Xeon 5500 server processors | KV.by

in Moscow
presentation of a new
server processor families
Intel Xeon 5500. For this, he came to Russia
CEO
server platform divisions
Kirk Skaugen.

Intel called Xeon 5500
most notable event in
server segment since Pentium
Pro, which appeared 15 years ago.
New processors can
automatically install
set value
energy consumption, and
speed up transactions in
data center and queries
to customer databases. Except
Moreover, they will play a key
role in those scientific studies,
where the main instrument is
supercomputers, — at the same time in all
cases, the processor will provide
high energy efficiency and
will reduce the cost of payment
electricity. As reported
supervisor
Belarusian-Russian program
«Skif-Grid», based on the new
processor scientists have already developed
system, performance
which is twice the
similar indicator
the previous generation.

Company representatives
demonstrated superiority
new processor into solution
complex tasks — in this case
software packages tested
face recognition (Face.com service) and
gas flow calculation system
moving objects. Xeon new
generation showed twice as much
performance than server
processors of yesteryear.

Xeon 5500 series (codenamed Nehalem
EP) supports technology
Hyper-Threading, Intel Virtualization Technology (VT)
new generation,
improved thanks to
Extended Page Table mechanism (it allows
system to adapt to a wide
range of workloads).

Xeon-based platforms
5500 series with three times more
wide bandwidth,
compared with the previous ones, allow
easy to manage a variety of
workloads. New Feature
— Intel Turbo Boost technology — boosts
system performance,
depending on the used
workload customer and
environments, dynamically increasing
clock frequency of one or
multiple computing cores.

Xeon 5500 is also different
improved
automated system
energy efficiency, which allows
customers to reduce the cost of
electricity. In standby
it consumes only 10 watts, so
system power consumption in this
state, compared with systems
previous generation, decreasing
by 50%. New integrated
power gates based on
unique Intel high-k metal technology
gate, allow you to independently disable
idle processor cores.

The Intel Xeon 5500 Series may be in
15 automated workers
states. This allows
significantly improve management
nutrition through regulation
system power consumption based on
throughput changes in
real time without
detriment to performance.

These and other possibilities
processor also allow you to reduce
minimize operating costs
systems on new processors. AT
economic crisis
customers can get a refund
invested funds for only 8
months.

The processor also provides
additional benefits
virtualization, allowing you to combine
within one virtual pool
servers of different generations in order to
failover improvements
virtual machines, leveling
load and data recovery
after accidents. New
Intel Nehalem microarchitecture along with
new generation of Intel technology
Virtualization Technology provides
increase in productivity
virtualization up to 2. 1 times and decrease
total delays from it to 40%.

As reported by representatives
companies, there are already orders for new
server processors from
Internet giants Yahoo and Google. BUT
the director of
system administration
«Yandex» Mikhail
Fadeev noted that the largest
Russian search engine ordered already
thousand servers on new
processors. According to him, the crisis
just added to search engines
worries — the number of requests has increased
almost doubled — people are looking for
work and opportunities for
organizing your own business.

Eduard TROSHIN,
Anatoly ALIZAR

P.S. Detailed technical
information about new Xeon processors
5500 can be found out on April 14 during
direct web chat with Intel experts at
website
www.intel.ru/ITGalaxy
from 12.00 to 14.00 Minsk time.

Print version

Number:

No. 14 of 2009

Heading:

Display-press

Did you notice an error? Select it with the mouse and press Ctrl+Enter!

  • Handsome for everyone. Review of the smartphone HONOR 70

  • Poster of IT events in October

  • Managing organization instead of a full-time director: pros and cons

  • Secret Router Features You Can Use

  • Choosing the best Android apps for drawing

  • So. Almost in the very center of Moscow, in the conference hall of a hotel, representatives of leading IT resources and publications gathered. All of them came to the event mentioned above, which was opened by Dmitry Konash, regional director of Intel in Russia and other CIS countries. He introduced Kirk Skaugen, Vice President of Intel Architecture Group and General Manager of Data Center Group , who made a presentation on the main topic.

                                                                                         In

                                                                                                                           . open x86 architecture.

    Later, Kirk noted the rapid growth of the hardware base of distributed («cloud») computing systems — with 2.5 billion users, the number of virtual servers exceeded 1 billion. there is a rapid increase in the popularity of new processors — this trend the speaker called the «Nehalem effect». I will not pull rubber — let’s go to the new «stones».

    But first, a little educational program. There are several series:
    Intel Xeon 3000 — Single
    Intel Xeon 5000 — Dual
    Intel Xeon 7000 — Multi-processor

    The name of each model consists of several numbers and a prefix. And if you don’t remember the numbers right away, then everything is simpler with letters:
    X — Performance level
    E — Mainstream processors (rack-optimized)
    L — «Initial» level, the most energy-efficient solutions.

    Prior to the official announcement, the Xeon 5600 series was codenamed Westmere-EP , where Westmere corresponds to the 32nm process technology for this family, and EP stands for Efficient Performance (energy efficient performance). The Xeon 7500 and 5600 series before the announcement were codenamed Nehalem-EX — here Nehalem corresponds to the microarchitecture of the same name and the 45 nanometer process, and the abbreviation EX stands for Expandable Scalable, which translates as expandable and scalable. It should be noted that this designation is fully consistent with the essence of the new models.

    5600

    The Intel Xeon 5600 series is a new generation of 32nm processors based on the Intel Nehalem microarchitecture. They use the second generation of metal gate transistors with Hi-K gate dielectric, because of it, the logic switching speed is increased and power consumption is reduced.
    The new processors (compared to the Intel Xeon 5500) allow you to replace 15 single-core servers with one system with a payback period of five months and allow you to use energy more efficiently.

    A dual socket server equipped with an Intel Xeon L5640 (60W) can provide the same level of performance as a system equipped with the previous generation 95W Xeon X5570 processors, however, power consumption will be 30% lower.

    At the same time, the new Xeon 5600 series managed to set 12 new world records in the segment of workstations and two-socket servers (according to benchmarks from various manufacturers).

    7500

    In turn, the Intel Xeon 7500 series have 4, 6 or 8 cores and can simultaneously process up to 8, 12 or 16 data streams, respectively. At the same time, the four-socket platform has 32 cores and 64 threads, and the 8-processor platform has 64 cores and 128 threads.

    However, the maximum number of sockets that can WORK is 256; it’s hard to imagine where such performance could really be needed … but there are rumors that such places exist.

    At the moment, the most expensive (and «powerful») processor is the 8-core Intel Xeon X7560. 8 times 2.26GHz, 24MB cache, a full set of technologies and 130W TDP — all this will cost $3838 in a batch of 1000 pieces 🙂 , depending on the model, ranges from 95 to 130 watts.

    New features allow you to replace up to 20 four-socket single-core servers with a single Intel Xeon 7500 server while maintaining the same level of performance.

    Among other things, new items carry on board more than 20 new features to improve reliability, availability and maintainability (RAS), including MCA recovery (which at the hardware level, when interacting with the operating system, allows you to localize fatal memory errors and continue operating the system generally). As Kirk previously noted, for the first time for the x86 architecture, the ability to detect and localize multi-bit errors in memory appeared, which for systems of the previous generation, in particular, the Xeon 7400 series processors, were fatal, and led, for example, to the “blue screen of death” (BSOD ). Volunteered to prove it Alexey Rogachkov , taking the initiative in his own hands. Right in the hall, he took turns introducing servers based on three generations of processors — Xeon 5500, 5600 and 7400 — into the «blue screen of death» state using a special script that initiates an error in the system memory. The new Xeon, on the other hand, survived such misfortunes calmly, continuing its work.

    Intel Xeon 7500 processors support 4x more memory and 8x more memory bandwidth than the Intel Xeon 7400. Up to 16 memory slots per processor and 1 terabyte (thousand+ gigabytes) of memory on a 4-socket platform. In addition, support for Intel virtualization technologies, including new I/O virtualization technologies and Intel Virtualization Technology (VT) FlexMigration, enables live migration of virtual machines across all Intel Core microarchitecture platforms. There’s also support for Intel Smart Cache memory technology, four Intel QPI channels, and support for everyone’s favorite Intel Turbo Boost.

    Xeons of the 7500 series, according to benchmarks from other manufacturers (Cisco, Dell, Fujitsu, IBM, NEC and SGI), set as many as 20 world records.

    More details about the records can be found on the corresponding page, but otherwise, the new chips are on average 3 times faster than the previous generation processors. There has never been such a «jump» in the history of the Xeon line.

    7500 and 5600 series processors, depending on the model, range from $744 to $369$2 per piece in a batch of 1000 units. All additional information is on a separate page.

    After the speech of the foreign guest, the microphone went to representatives of companies that use high-performance systems or even tried new processors. Lukoil-Inform was one of the first to «feel» the new Xeons.

    Denis Neshtun , Deputy Chief Engineer of the company: « Progress in the development of the X86/X64 platform, a good example of which is the Intel Xeon 7500-based server we tested, demonstrates that almost any enterprise tasks can be performed on this platform -class with the required levels of performance and fault tolerance. At the same time, the final cost of the solution looks extremely attractive to customers «.

    Rector of South Ural State University (SUSU), Alexander Shestakov SUSU is an innovative university that invests significant resources in the development of its scientific and technical potential. The SUSU Supercomputer Center today has two supercomputers with a total capacity of 40 TERAFLOPS: “ SKIF-Ural ” on Intel Xeon E5472 processors and “ SKIF-Aurora ” on Intel Xeon 5570 processors. This determined the main development vector of the SUSU Supercomputing Center – the use of high-performance computing for solving industrial problems.

    «The use of supercomputer simulation can significantly reduce the cost of developing new types of products and technologies.»

    The speech of Vasily Shelkov , CEO of Rock Flow Dynamics seemed to me the most illustrative. Well, or his slides were the most interesting, or something 🙂 In general, they tested the new processors in modeling the terrain of a real deposit of natural resources.

    « The launch of the new six-core Intel Xeon 5600 and eight-core Intel Xeon 7500 processors opens up a fundamentally new stage for the effective modeling of hydrodynamic filtration processes for oil and gas fields. Acceleration of the processes of parallel calculations of hydrodynamic problems on multi-core multi-processor computing systems depends almost equally on the number of cores and on the speed of memory exchange channels. Testing of such servers containing 32 processor cores was carried out by our specialists on models of real deposits. Their use made it possible to speed up the calculation by 21 times, which is 3–4 times higher than the results achieved on the previous generation of four-processor servers, and to achieve a one and a half to two times faster performance ever observed on cluster MPI systems with 32 and even 64 nodes.

    For dual-processor systems, a simple replacement of existing quad-core processors with six-core processors of the new generation allows you to get a 20-25% automatic increase in performance, moreover, without the need to completely replace the computing system, ”said Vasily.

    These were just a few of the possible use cases for the new systems.

    Interesting arithmetic

    As the saying goes, the new processors are “more than just valuable fur”… Using the new Xeons will allow IT managers to get a number of benefits: a quick return on investment, a significant reduction in energy costs, a reduction in the cost of expanding and maintaining data centers and increasing their computing power . By replacing legacy platforms, companies can dramatically reduce the number of servers and make significant savings without compromising processing power. There is also an opportunity to use existing premises to get more computing power at the same price. In order not to be unfounded, I will give the results of certain calculations.

    In 2005, 343407 servers were shipped to Germany. If these systems based on single-core Intel Xeon (3.33 GHz) are replaced by systems with Intel Xeon 7500, the payback period will be 9 months. In addition, the following results will be obtained:
    — 17170 systems will be required for replacement;
    — A 95 percent reduction in total electricity costs will be achieved;
    — TCO reduction will be €3.