Intel xeon price list: Intel Xeon Processors Price List

Intel Xeon 2600 Processors Price List | Lowest Price in India

Rated 4.56 out of 5 based on 9 customer ratings

09

(
9 Reviews )

  • Grade ‘A’ Quality Processors
  • Low Price Guarantee
  • 90 Days Warranty
  • Quick Delivery Anywhere in India
  • 24/7 Live Tech Assistance

₹ 3,499.00₹ 3,999.00 (-13%)

Buy Now

Sale!

Click to open expanded view

Description

Intel Xeon 2600 Series Processors Price List

Processor Family Description Price Buy Now
E5-2600 V1 Family Intel Xeon Processor E5-2660 (8-cores, 20MB Cache, 2.20 GHz, 95W) Rs. 11,059/- Buy Now
E5-2600 V1 Family Intel Xeon Processor E5-2670 (8-cores, 20MB Cache, 2. 60 GHz, 115W) Rs. 11,699/- Buy Now
E5-2600 V2 Family Intel Xeon Processor E5-2660 V2 (10-cores, 25MB Cache, 2.20 GHz, 95W) Rs. 13,566/- Buy Now
E5-2600 V2 Family Intel Xeon Processor E5-2650L V2(10-cores, 25MB Cache, 1.70 GHz, 70W) Rs. 15,699/- Buy Now
E5-2600 V2 Family Intel Xeon Processor E5-2651 v2 (12-cores, 25MB Cache, 1.80 GHz, 70W) Rs. 15,699/- Buy Now
E5-2600 V2 Family Intel Xeon Processor E5-2420 v2 (6-cores, 15MB Cache, 2.20 GHz, 80W) Rs. 10,499/- Buy Now
E5-2600 V3 Family Intel Xeon Processor E5-2650L V3 (12-cores, 30MB Cache, 1.80 GHz, 64W) Rs. 32,599/- Buy Now
E5-2600 V3 Family Intel Xeon Processor E5-2683 v3 (14-cores, 35MB Cache, 2.00 GH, 120W) Rs. 56,599/- Buy Now
E5-2600 V3 Family Intel Xeon Processor E5-2678 v3 (12-cores, 30MB Cache, 2.50 GHz, 120W) Rs. 49,499/- Buy Now
E5-2600 V3 Family Intel Xeon Processor E5-2650L V3 (8-cores, 20MB Cache, 2.00 GHz, 95W) Rs. 9,759/- Buy Now
E5-2600 V4 Family Intel Xeon Processor E5-2620 v4 (8-cores, 20MB Cache, 2.10 GHz, 85W) Rs. 32,599/- Buy Now
E5-2600 V4 Family Intel Xeon Processor E5-2630L v4 (10-cores, 25MB Cache, 1.80 GHz, 55W) Rs. 45,599/- Buy Now
E5-2600 V4 Family Intel Xeon Processor E5-2609 v4 (8-cores, 20MB Cache, 1.70 GHz, 85W) Rs. 19,599/- Buy Now
E5-2600 V4 Family Intel Xeon Processor E5-2630 v4 (10-cores, 25MB Cache, 2.20 GHz, 85W) Rs. 35,199/- Buy Now

Reach Out to US Via

[email protected]

+91 733-733-0401

Chat Now

Check out the Intel Xeon 2600 processors price list on our website that includes technical descriptions, specification in detail and also look into the updated Intel Processor price list for easy comparison of all Server Processors. We ensure to offer you the genuine price list for quick Compare & check out of all the available E5-2600 v- series(V1, V2, V3, V4) processors and also compare our prices with other vendors online. Find Intel Xeon 2600 processors at lowest prices Only at Server Basket with grade A quality hardware offering exact matching to your Dell/HP/IBM server.

Intel Xeon 2600 Processors At Lowest Price

Our Intel Xeon 2600 processors price list includes all the high-end processors at lowest possible prices and with amazing features. All the high-end features like supporting eight cores to 24 cores per processor, turbo Boosting technology 2.0, Data center manager, Node manager, advanced vector extensions come along with every Intel Xeon 2600 processors model as a standard package at lowest prices.

Matching-Pair Available

Our huge range of Intel Xeon E5-2600 processors with their Detailed configuration & Price list helps you find the right match for your Dell/HP/IBM server. Check out all the Xeon 2600 processors including v1, v2, v3 and v4 series, ranging speeds from 1.7GHz,2.0GHz, 2.1GHz, 2.2GHz, 2.4GHz, 2.6GHz, 2.8GHz, 3.0GHz, 3.1GHz, 3.3GHz and 10MB to 25MB cache. If you are in need of high-end Xeon v2 series 3.30GHz eight-core Intel Xeon processor with 25MB cache for your server, we promise to give you exact matching pair to grow your business.

Add Power To Dell/HP/IBM Server

Take your server performance to the next level by upgrading it with Intel Xeon 2600 Series processor. The advanced technologies in these processors along with a maximum number of cores bring both power and performance to your server. Intel Xeon 2600 processors based servers enable you to deploy, create, execute and deliver the solutions faster than ever. Some of the xeon 2600 Series CPUs supported servers include: Dell PowerEdge R620, R630, R720, R720xd, R730, etc., HP ProLiant DL380p Gen8, DL360p Gen8, BL460c Gen8, ML350p Gen8 Server, IBM X3650 M4, and X3550 M5 Server, etc.

All Intel Xeon E5-2600 Series CPUs Available

Whether you are upgrading an old server or buy a new system for your business requirements, it is good to choose the Intel Xeon E5-2600 series processors which can do any task effectively with ease. Server Basket provides a wide range of Intel 2600 Series CPUs including the Intel Xeon E5-2620, E5-2630, E5-2637, E5-2640, E5-2643, E5-2650, E5-2660 v2, E5-2650L v2, E5-2650L v3, E5-2620 v4, E5-2630L v4 etc

Easy Buyback Policy

We have easy Buyback policy to resale your old or unused Intel Xeon 2600 processors. Usually, many of them prefer to sell out unused processors to clear out space and earn some bucks. And these unused processors may be useful to meet requirements of many small business or start-ups in a budget-friendly manner and even being eco-friendly. So, we offer easy to buy back of these processors, whenever you intend to sale. Just give us a call or email us or simply ping us on live chat and detail your request. We process immediately by checking the working capability of the device and offering a quote.

Low Cost Xeon E5 2600 Processor Add-Ons

Get all the low cost CPU add-ons like a heat sink, cooling fans and cooling paste at a discounted price from Server Basket. We only provide the add-ons that are sourced from reputed brands.These Processor add-ons would increase the performance level without increasing your power bills.

Grade ‘A’ Quality

We present you the genuinely branded Intel Xeon 2600 processors series on our price list with leading hardware and high-quality features. No worries about fake Intel Xeon processors and hardware included. Intel Xeon 2600 processors family supports up to 768GB system memory, secure hardware, advanced encryption, and trusted execution technology. Each processor model on the price list comes with a standard warranty and also offer grade A quality spare parts in case of replacement issues.

Free Remote Installation Support

You might not know how to install the Intel Xeon E5-2600 series processor on to the server. We know that you need tech assistance to perfectly install CPU in the Server . For your peace of mind, Server Basket provides you with free remote installation support. Once the processor is delivered to you, contact us via phone call, live chat or email to get in touch with the expert team who will guide you through the installation process.

Quick Delivery All Over India

We offer instant Shipping to any Location all over India, to any city like Delhi, Mumbai, Hyderabad, Chennai, Bangalore, Noida, Lucknow, Pune, Bangalore, Nagpur, Agra, Patna, etc. simply complete the purchase process by entering your location with address in detail and then make the payment. We process your order immediately and initiate the shipping. It’s solely our responsibility to deliver your processor safely to your doorstep Anywhere in India

Reach Out to US Via

[email protected]

+91 733-733-0401

Chat Now

Reviews (9)

Average Rating

4.56

Rated 4. 56 out of 5 based on 9 customer ratings

09

(
9 Reviews )

5 Star

55.56%

4 Star

44.44%

3 Star

0%

2 Star

0%

1 Star

0%

FAQ

Yes, we analyzed the current market trends and created this Intel Xeon processor list. We came up with the best Intel Xeon processor price in India. Contact our team if you have any doubts regarding the displayed CPU price list.

Along with reasonable Xeon CPU prices, we offer many benefits to our clients. We offer free installation and technical support along with quick delivery services to any city in India. We strive to make our customers comfortable.

Yes, we do offer special Intel server price discounts on the bulk orders. Take a look at our Xeon server prices, and you’ll see that we offer better Intel Xeon prices in India compared to other vendors in the market.

Yes, the Intel Xeon 2600 processors you buy from us are in perfect working condition and can be used right out of the box. We check the quality of our processors by testing them in extreme conditions before dispatching them.

No. we don’t charge any extra money for the installation assistance we provide. You only have to pay the Intel Xeon server price you signed up for. Contact our team when you receive the processor, and they’ll help you with installation.

This Intel server CPU can support Dell, HP, and IBM servers. Contact us to know if this Intel 2600 processor can be installed in your server model or buy an Intel processor from our excellent processor collection at the best prices.

3rd Gen Intel Xeon Scalable Ice Lake SKU List and Value Analysis

Intel Ice Lake Xeon In Hand

The 3rd Gen Intel Xeon Scalable line’s naming is a mess. In this article, we are going to have our traditional SKU List and Value Analysis piece, but this is one of two for the generation. This one is our 3rd Gen Intel Xeon Scalable Ice Lake SKU List and Value Analysis but we previously had the 3rd Gen Intel Xeon Scalable Cooper Lake SKU List and Value Analysis. Cooper Lake is a 2020-era 14nm product driven by Facebook while Ice Lake is the 2021 mainstream platform. If that sounds unreasonably confusing, it is. Still, let us get to the analysis piece.

Reading a 3rd Gen Intel Xeon Scalable Model Number

We wanted to take a moment and discuss what the model numbers mean. Specifically, we will use the Intel Xeon Platinum 8380 as our example SKU. With the 8380 we get:

  • 8 – this tells us we have a “Platinum” level SKU
  • 3 – 3rd Generation Intel Xeon Scalable
  • 80 – The model designation

Since there is the Platinum 8380 (Ice Lake) and the Platinum 8380H/ 8380HL (Cooper Lake) you may see a H or an L at the end of a 3rd Gen SKU. These mean:

  • H – Cooper Lake/ Cedar Island platform
  • L – High Memory Enabled SKU for 4.5TB/ socket support

Please note, the Ice Lake processors dropped the “L” for extended memory support. There is also a “C” for “Custom” which is used in the Platinum 8321HC 88W 26C part for Facebook.

With this generation, not only do we have the “Ice Lake” designation, but Intel has made two-thirds of its SKUs from Ark special by adding additional letters at the end of the four digits.

  • Y – Supports Intel Speeds Select Technology Performance Profile 2.0 (SST-PP)
  • N – Networking Optimized
  • P – IaaS Cloud Specialized Processor
  • V – SaaS Cloud Specialized Processor
  • Q – Liquid Cooling Optimized
  • M – Media Processing Optimized
  • T – Long-life and Thermal friendly
  • S – SGX 512GB secure enclave (Some 512GB parts)
  • U – Usually means single socket only or “Uni-processor” unless there is a single socket N SKU that will instead use N

A few notes here. First, M in the 1st and 2nd Gen parts denoted “Medium” memory levels. In response to AMD EPYC, we saw the 2nd Gen Intel Xeon Scalable M SKUs Discontinued, but now the letter means media processing. This was in the full table Intel sent us for launch, but the Platinum 8352M is not on Intel Ark so we are going to exclude it from our analysis. Likewise, the Platinum 8362 was in the original table but not on Ark.

As a quick update, we have a video on the current state of naming conventions:

Here is the original table from our Intel Xeon Ice Lake launch coverage:

3rd Generation Intel Xeon Scalable SKU Stack April 2021

If you get into the detail of the letters above, and the table, you may get the feeling that someone at Intel gets a bonus for every new category or letter added to the naming scheme. One can liquid cool any of these processors, but there is a Q category now.

3rd Gen Intel Xeon Scalable Ice Lake Nuance

One of the big changes in this generation is that the Ice Lake series only supports up to two sockets. Whereas AMD EPYC is mostly differentiated by core counts, clock speeds, and cache, Intel goes much further than this, and the multitude of letters do not cover this even.

3rd Gen Intel Xeon Scalable Ice Lake SKU List And Value Analysis Nuance

Beyond the U SKUs for a single socket, Intel has a few main points of differentiation:

  • SGX Enclave Size
  • UPI Links
  • UPI Speed
  • DDR4 Speed
  • Intel Optane PMem 200 Support

As can be seen in the chart above, and largely unlike AMD, Intel varies a number of aspects of its chips to create artificial segmentation. The Xeon Silver 4316 does not support Optane PMem 200, but the Silver 4314 does, even though we only get DDR4-2666 speeds from the 2017 1st Generation Xeon Scalable era. A Xeon Platinum SKU can have maximum SGX secure enclaves of 8GB, 64GB, or 512GB.

The biggest challenge with how Intel does this, and how its competitors (AMD and Ampere on the Arm side as an example) are not, is that if one deploys an Intel Xeon Scalable Ice Lake processor, then has a new workload emerge a year later the hardware can potentially not be used. For example, if you have a workload that would require a 9GB SGX secure enclave but have a $3,950 Platinum 8358P optimized for Cloud/ IaaS, you would need to buy either new servers or go through a CPU upgrade process.

3rd Gen Intel Xeon Scalable Ice Lake SKU List and Value Analysis

Getting into our more traditional value analysis, here is what the table looks like. Again, please keep in mind we removed the two SKUs mentioned above because they were not on Intel Ark with listed MSRPs.

3rd Gen Intel Xeon Scalable Ice Lake SKU List And Value Analysis Small Full

There are a few key points here. First, the cost per core seems to have fallen in this generation while the TDP has generally gone up. We also will note that the pricing do not necessarily follow the numerical order. For example, the Xeon Gold 6338 and the 6338T/ 6338N variants may seem like lower-numbered units than the Xeon Gold 6346. At the same time, we find the pricing is higher for the Gold 6338 versus the 6346. The key to remember is that a higher number does not necessarily mean a chip costs more nor does it mean that the higher numbered part has more cores.

Something else that stands out with this generation is that the clock speeds actually appear to have taken a dip. The Intel Xeon Gold 6230R was a 26 core part with a 2.1GHz base and 4.0GHz maximum turbo clock. The Xeon Gold 6330 is a 28 core part with a 2.0GHz base and a 3.1GHz maximum turbo. Where the 3.8-4.0GHz had been common, with Ice Lake frequencies are generally down but core counts are generally higher. This is important for per-core performance though. Intel will state that at the same frequency the Ice Lake core is several percent faster than the Cascade Lake core that preceded it. One must be mindful that the Ice Lake parts often run at a lower clock speed which negates that microarchitectural benefit.

3rd Gen Intel Xeon Scalable Ice Lake SKU List And Value Analysis Large Traditional

As we have seen for generations, the Xeon Silver and lower-end Gold series tend to be the best values in the lines. There are some core/ frequency optimized SKUs such as the Xeon Gold 6334 which are priced at around $277 per core because they can help to save on per-core licensing costs.

We generally look at a SMT/ Hyper-Thread as about a 30% performance addition over a non-HT core. In this generation, Intel has embraced Hyper-Threading across the board. This is likely due to increased competition which means Intel needs to bolster its thread count. If we look at the dual-socket capable SKUs, we can see that Intel actually does not have a Xeon Bronze in this series. Instead, the line starts with the Intel Xeon Silver 4310 and 4309Y.

3rd Gen Intel Xeon Scalable Ice Lake SKU List And Value Analysis Small 2P

Intel also has three single-socket-only SKUs. Two of them have the “U” noted as they are uni-processor only capable parts. One SKU, in particular, is different and that is the Xeon Platinum 8351N. This N part, unlike the Xeon Gold N parts (Gold 6338N, 6330N, and 5318N) is a single-socket-only part. Also, it is listed as having only DDR4-2933 support unlike the other DDR4-3200 Xeon Platinum parts aside from the Platinum 8352V.

3rd Gen Intel Xeon Scalable Ice Lake SKU List And Value Analysis Small 1P

Although Intel does not have a single-socket 16 core part like the AMD EPYC side, Intel has a number of 16 core offerings including the Xeon Silver 4314 which gets Optane PMem 200 support, unlike the other Xeon Silvers. This part is a sub $700 part which means there is unlikely a lot of room to have a single-socket 16-core value part.

Final Words

Across a large number of markets, Intel became more competitive with its Ice Lake Xeon generation. Still, we are now at the point where we are calling chips the 3rd Generation Intel Xeon Scalable “Ice Lake” parts such as the Intel Xeon Platinum 8352V which is for virtualization but for only small SGX enclaves of 8GB or less. When it takes that many words to describe a processor, without its core count, clock speed, or cache, it is a good sign that the entire naming system needs an overhaul.

3rd Gen Intel Xeon Scalable Ice Lake SKU List And Value Analysis Large

This piece was a bit later than our normal SKU list and value analysis pieces, but it is one that is perhaps even more fascinating.

Processors price list, prices, descriptions, photos

nine0010

CPU AMD A6 9500 (AD9500AG) 3.5 GHz/2core/SVGA RADEON R5/1 Mb/65W Socket AM4

Manufacturer: AMD; Model: A6-9500 APU with Radeon R5 Series; Frequency: 3.5 GHz or up to 3.8 GHz with Precision Boost

Article 324069/1

nine0010

CPU Intel Core i3-8100 3. 6 GHz/4core/SVGA UHD Graphics 630/ 6Mb/65W/8 GT/s LGA1151

Manufacturer: Intel; Model: Core i3-8100 Processor; Frequency: 3.6GHz

Article 325115/1

nine0010

CPU Intel Core i3-10100F OEM {3.6GHz, 6MB, LGA1200}

Article 1804728/7

nine0020
14 780
under order

nine0020
10 520
available

Processor CPU Intel Core i3-10100 3.6 GHz/4core/SVGA UHD Graphics630/6Mb/65W/8 GT/s LGA1200

Manufacturer: Intel; Model: Core i3 10100; Frequency: 3.6 GHz or up to 4.3 GHz with Turbo Boost

Article 466222/1

10 240
available

CPU Intel Core i7-10700 2.9 GHz/8core/SVGA UHD Graphics 630/2+16Mb/65W/8 GT/s LGA1200

Manufacturer: Intel; Model: Core i7 10700; Frequency: 2.9 GHz or up to 4.8 GHz with Turbo Boost

Article 466272/1

25 750
available

CPU Intel Pentium Dual-Core E5400 2. 7GHz/ 2Mb/ 800MHz LGA775

Manufacturer: Intel; Model: Pentium Processor E5400; Frequency: 2.7GHz

Article 82576/1

1 430
available
nine0004

Processor Intel Pentium E5700 3.0GHz 2Mb 800MHz LGA775

Manufacturer: Intel; Model: Pentium Processor E5700; Frequency: 3.0GHz

Article 102545/1

1 500
available
3 070
available
13 120
available

CPU Intel Core i5-10400F Comet Lake OEM {CM8070104282719SRH79/CM80701042

}

Article 1785413/7

10 890
available

CPU Intel Core i5-10400 2.9 GHz/6core/SVGA UHD Graphics 630/12Mb/65W/8 GT/s LGA1200

Article 1783751/7

16 040
available

CPU AMD Ryzen 7 5700G OEM

Article 1861281/7

21 040
available

CPU AMD A6 9500E (AD9500AH) 3. 0 GHz/2core/SVGA RADEON R5/ 1 Mb/35W Socket AM4

Article 321073/1

2 500
available

CPU AMD A6 9500 OEM {3.5-3.8GHz, 1MB, 65W, Socket AM4}

Article 1486542/7

3 070
available

CPU AMD Ryzen 5 PRO 4650G OEM

Article 1803280/7

16 290
available

CPU Intel Core i3-10100F 3.6 GHz/4core/6Mb/65W/8 GT/s LGA1200

nine0013 Manufacturer: Intel; Model: Core i3 10100F; Frequency: 3.6 GHz or up to 4.3 GHz with Turbo Boost

Article 482713/1

8 980
on order

CPU Intel Core i3-10100 Comet Lake OEM {3. 6GHz, 6MB, LGA1200}

Article 1782819/7

10 400
on order

[Processor] CPU Intel Core i3-10100 Comet Lake OEM {3.6GHz, 6MB, LGA1200}

Article 1782819/3

9 680
on order
9 510
on order

[Processor] CPU Intel Core i5-10400 Comet Lake OEM {2.9GHz, 12MB, LGA1200 CM8070104282718/CM8070…

Article 1783751/2

CPU AMD Ryzen 3 3200G OEM {3.6GHz/Radeon Vega 8}

Article 1703965/7

11 370
on order
nine0013 CPU Intel Core i5-10400 2. 9 GHz/6core/SVGA UHD Graphics 630/12Mb/65W/8 GT/s LGA1200

Article 464450/1

15 740
on order

CPU Intel Core i3-10105F 3.7 GHz /4core/6Mb/65W/8 GT/s LGA1200

Article 512960/1

8 390
on order

CPU Intel Celeron G5900 3.4 GHz/ LGA1200

Article 466332/1

3 360
on order

CPU Intel Xeon Silver 4215R 3.2 GHz/8core/8+11Mb/130W/10.4GT/s LGA3647

Article 463208/1

112 280
on order

CPU Intel Core i7-11700F 2.5 GHz/8core/4+16Mb/65W/8 GT/s LGA1200

Article 503494/1

28 970
available
nine0004

[Processor] CPU Intel Xeon Silver 4215R OEM

Article 1774797/7

92 700
on order

[Processor] CPU AMD Ryzen 5 5600G OEM

Article 1865691/7

nine0011

14 760
available

[Processor] CPU Intel Core i5-10400 Comet Lake OEM {2. 9GHz, 12MB, LGA1200 CM8070104282718/CM8070…

Article 1121111285/7

12 650
on order
nine0004

CPU Intel Core i3-12100 LGA1700

Article 573664/1

11 590
available

CPU Intel Core i3-12100F LGA1700

Article 584014/1

CPU Intel Core i5-12400F Alder Lake OEM {2.5 GHz/ 4.4 GHz Turbo, 18MB, LGA1700}

Article 1888262/7

15 050
on order
nine0008

[Processor] CPU Intel Core i5-10400F Comet Lake OEM {CM8070104282719SRH79/CM80701042

}

Article 1121111286/7

9 590
available

Intel Xeon Phi

Compute Module Performance Test

  • NVIDIA® Tesla™ Computing System — GPU Computing Solutions
    nine0692

  • Processor comparison tables: Intel Xeon E5-2400, Intel Xeon E5-2600
  • Links to price list sections: Processors, Compute modules
  • Server configurators
  • List of server and workstation models that support Intel Xeon Phi

Intel Xeon Phi Compute Module — Math Coprocessor with Double Precision Performance up to 1 TFLOPS — one trillion operations per second ! That’s four times faster than a workstation based on two Intel Xeon E5-2650 processors (270 GFLOPS peak performance). Up to 8 Intel Xeon Phi can be installed in one system.

Computing module (math coprocessor) Intel Xeon Phi

Computing module Intel Xeon Phi is based on the 64-bit x86 architecture, so it does not need to rewrite existing programs, it is enough just to recompile them — unlike NVIDIA computing modules, based on the CUDA architecture and requiring the use of specialized functions when programming. nine0014

In this article, we’ll report on the Intel Xeon Phi performance test results, and also discuss some aspects of parallel computing, that is, tasks that can be broken down into parallel branches or processes that run simultaneously.

Let’s clarify a few terms that we will use in this article.

Double-precision floating-point number is a type of computer representation of a number in normalized form in accordance with the IEEE 754 standard. Such a number occupies 8 bytes in computer memory and is written as a signed mantissa and an exponent. Mantissa representation precision 16 significant digits, number range from 10 -308 to 10 308 .

FLOP is one double operation. Operations can be as simple as the arithmetic operations of addition and multiplication, or more complex: extracting a root, calculating a logarithm, raising to a power, or determining the values ​​of trigonometric functions. The maximum performance indicators are achievable on simple arithmetic operations.

FLOPS is the number of double precision operations performed in one second. nine0014

In our tests, we used a two-processor Team Workstation P4000CR with two Intel Xeon E5-2650 processors with Intel Sandy Bridge microarchitecture.

The Intel Xeon E5-2650 processor has eight physical cores. Each core can perform 8 double-precision operations per processor cycle. Since the nominal operating frequency of the processor is 2 GHz, the peak theoretical performance of a single core is:

8 FLOP/clock * 2 GHz (cycles/second) = 16 GFLOP/second = 16 GFLOPS

How is this performance achieved?

A full cycle of processing one machine instruction by the processor core takes more than one cycle. It includes the following main steps: loading the instruction and data, decoding the instruction, executing the instruction, and finally writing the result. This is a highly simplified scheme, but in fact, instruction processing involves more stages and requires the corresponding number of processor cycles.

The processor is organized according to the conveyor principle. Each clock cycle, the next instruction is fed to the input of the processor. The instructions that are already on the pipeline are passed on to the next stage. The output of the pipeline is the result of the processed instruction. nine0014

Thus, several instructions are simultaneously processed on the pipeline with a shift by one cycle. Thanks to the pipeline organization, the processor actually provides the processing of one instruction per clock cycle.

But how does a processor perform not one, but eight operations per cycle?

The processor core can use four numbers as an operand due to the presence of 256-bit registers. One register holds four 64-bit doubles. Such a processor is called a superscalar or vector processor, because the operation is performed not on a number, but on a vector. This feature is provided by the Intel AVX (Advanced Vector eXtensions) processor instruction set extension. nine0014

The processor core has several specialized ALUs (Arithmetic Logic Units) that perform various types of calculations. The operations of addition and multiplication are performed by different ALUs, so they can occur simultaneously. The presence of combined addition and multiplication instructions allows the core to perform up to eight operations on double precision numbers in one clock cycle.

This technology, when operations with vectors are performed instead of operations with scalar operands, is called by vectorizing . «Vectorization» of the program occurs automatically at the compilation stage. The efficiency of vectorization can be increased by optimizing the source code of the program.

So, the core has a peak performance of 16 GFLOPS. Our test workstation with two 8-core processors has a total of 16 cores. Total theoretical workstation performance:

16 GFLOPS/core* 8 cores/CPU*2 CPU = 256 GFLOPS

To get this performance for a single program, it is necessary to divide this program into 16 threads, which will be executed in parallel each on its own core. Because the threads will be running on a system with shared memory, they can, if necessary, communicate through program global variables stored in that memory. nine0014

To create such parallel programs that work in systems with shared memory, the standard OpenMP (Open Multi-Processing) has been developed — a set of compiler directives, libraries and environment variables designed for programming multi-threaded applications in multi-core and multi-processor systems with shared memory.

An example of a program that can be easily «parallelized» in a system with shared memory is a loop in which the result of calculations at each next step does not depend on the results of calculations at previous steps. Such a cycle can be divided into several smaller cycles, each of which will be executed by a separate parallel program branch. nine0014

Parallel executable code can be generated automatically by the compiler. You can also explicitly specify the section of the program that needs to be «parallelized» by adding a special OpenMP pragma to the source code of the program. Using pragmas, you can create threads, distribute tasks between them, manage data classes (general or local), and also synchronize threads if they exchange data during calculations.

In many cases, automatically adding parallelism to a program using the compiler provides a fairly efficient result. The number of program branches during execution depends on the number of available processor cores. On a computer with a single-core processor, the program will run sequentially, while on a computer with two 8-core processors, 16 parallel branches will run. nine0014

A parallel task can also be run on the nodes of a computing cluster. In this case, identical copies of the program, processes, are launched on the nodes. Each process is given a unique process number when it starts, which determines the role of its node in the overall computational work.

Processes on different nodes do not have access to the main memory of other nodes, so they cannot communicate through memory, as happens in systems with shared memory. The organization of interaction between processes in this case is regulated by the MPI standard (Message Passing Interface — Message Passing Interface), which describes how messages can be exchanged between parallel processes of one task in systems with distributed memory. nine0014

Various physical interfaces can be used as the communication medium. In our testing, a Gigabit Ethernet network was used to communicate between cluster nodes. Each node was connected to the network via two ports, combined into an aggregated channel according to the 802.3ad standard.

An MPI task can also be run on a single computer. In this case, several parallel processes are launched on it at the same time, which use the computer’s common memory as a communication medium, but the data exchange occurs not through common program variables, as in the OpenMP standard, but through the MPI messaging protocol. nine0014

Compute module Intel Xeon Phi 3120A (tested model) has 57 physical cores, 6 GB of RAM and operates at a frequency of 1100 MHz. Using 512-bit registers, each Intel Xeon Phi core performs up to 16 double precision operations per clock cycle. Therefore, its theoretical peak performance is:

16 FLOP/clock * 1.1 GHz (cycles/second) * 57 cores = 1003 GFLOP/sec = 1.003 TFLOPS

Intel Xeon Phi 3120A Specifications:

  • Many Integrated Core (MIC), Knights Corner, 22 nm process
  • 57 compute cores, Hyper-Threading technology, 4 threads per core, 228 threads
  • 64-bit data width, 512-bit vector registers
  • L1 cache: 32 KB data / 32 KB instructions per core, 1 clock latency
  • L2 cache: 512 KB per core, 11 clock latency
  • 6 GB GDDR5 RAM, 16-channel memory controller
    nine0692

  • 352 GB/s peak memory subsystem bandwidth, 200 GB/s effective
  • PCIe 2. 0 x16 host interface, 6 GB/s throughput in each direction
  • double precision peak performance 1003 GFLOPS
  • power consumption 300 W, active cooling

The Intel Xeon Phi Compute Module runs its own Linux-based operating system, has a virtual disk with a file system, 6 GB of local RAM, a virtual network interface, and an IP address. From the point of view of the host (workstation), it can be considered as an independent computing node. nine0014

Theoretically, up to eight Intel Xeon Phi compute modules can be installed in one system, but in reality, for existing server platforms, we can talk about a maximum of four modules. The limiting factors are: the number of PCIe lanes — for four modules, you will need 64 lanes out of 80 available in a two-processor system; power supply system power — for four modules, a total of 1200W is required; performance of the cooling system of the workstation, designed for the installation of a maximum of four expansion cards with a dissipated thermal power of 300 W each. nine0014

The Intel Xeon Phi Compute Module has various usage models.

Native mode

In this mode, the Intel Xeon Phi Compute Module acts as a standalone computer.

A program to be run on a module must be compiled into executable code designed to run only on the module. Then the program, together with the necessary dynamic libraries, must be written to the virtual disk of the module (or to a shared network drive of the module and the host) and launched from the module console. nine0014

Note that in this usage model, the task size is limited by the module’s own RAM, which is 6 GB.

Heterogeneous cluster mode

A host with an Intel Xeon Phi Compute Module installed in it can be considered as a two-node cluster. On such a cluster, you can run either one task running in parallel on two nodes, or two different tasks, each on its own node. In turn, several of these hosts can be combined into a «large» heterogeneous cluster consisting of nodes of two types. On such a cluster, several tasks can be performed simultaneously, choosing nodes of the appropriate type for each task. nine0014

Coprocessor mode or «offload»

In this mode, the task is run on the host, and the coprocessor executes only certain parts of the program with a high degree of parallelism, for which execution on the coprocessor is more efficient than on the host.

In the source code of such a program, these sections are marked with special directives that indicate to the compiler that these sections should be compiled for execution on the coprocessor. During task execution, these parts of the program are «offloaded» to the coprocessor and executed there. nine0014

In this mode, the size of the task is not limited by the size of the local memory of the computing module. We performed performance testing of Intel Xeon Phi in this mode.

The following systems were used in our performance comparison test:

  1. Intel Xeon E5-2650 dual processor workstation, 128 GB DDR3-1600
  2. Cluster of four dual-processor workstations based on Intel Xeon E5-2650, 32 GB DDR3-1600
    nine0692

  3. Intel Xeon E5-2650 dual processor workstation, 128 GB DDR3-1600 with Intel Xeon Phi 3120A Compute Module installed.

Tested on Red Hut Enterprise Linux 6.3 operating system.

For performance testing, we used the High-Performance LINPACK package, a test used to compile the Top 500 Supercomputers list.

The test consists in solving a system of N linear equations using the LU-expansion method. As initial data, a square matrix of coefficients of size N x N is used, which is filled using a random number generator. In what follows, we will use the term «problem size N», meaning an N x N coefficient matrix.

The choice of the HPL package as a test is due, firstly, to its wide popularity and availability, as well as to the fact that the calculation algorithm is well «parallelized» and scalable. Finally, the problem of solving a system of linear equations in itself is often encountered in various practical calculations.

Calculations were made with double precision. The total number of operations that must be performed during the test is known in advance and is determined by the formula: 2N 3 / 3 + 2N 2 . By dividing the resulting number by the duration of the test, you can determine the overall performance of the system.

This test is a «good» parallel task — the initial matrix can be divided into blocks, which are processed by separate parallel program branches. During calculations, parallel threads must exchange data to calculate values ​​at the block boundary.

We carried out tests for several task sizes N, corresponding to the «standard» values ​​of the amount of RAM in computing systems:

As task size N increases, the total number of computations grows proportionally to N to the third power, and the number of exchanges between parallel processes grows proportionally to N to the second power (approximately). Thus, the share of exchanges in the total task execution time decreases, while the share of calculations increases. Since we measure the speed of calculations, not exchanges, as the size of the task increases, the measured system performance increases. An exception is a sequential task, which does not contain exchange operations, so its performance should not depend on size. nine0014

As mentioned above, the peak theoretical performance of a two-processor workstation based on Intel Xeon E5-2650 2 GHz should be:

2 GHz * 8 FLOPs / clock * 8 cores * 2 CPU = 256 LOPS

For measuring performance in an OpenMP environment we first compiled the program accordingly. By setting the number of parallel threads using environment variables, we measured performance for 1, 2, 4, 8, and 16 threads for task sizes from 10,000 to 117,000. The results are shown in the graph:

Measurements were taken with Turbo Boost disabled. Intel Turbo Boost is a technology to increase the operating frequency of the processor during peak load. The frequency increases automatically, provided that the heat dissipation of the processor does not exceed the set limit. When one core is loaded, its frequency can be increased significantly — up to 2.8 GHz, when several cores are loaded at once, the frequency increase is less. To prevent Turbo Boost from distorting the performance scaling pattern, we disabled it. However, we measured performance for 16 threads and with Turbo Boost enabled, it was 274 GFLOPS, which exceeds the theoretical limit for a frequency of 2 GHz — 256 GFLOPS. From this we can conclude that the average frequency of the processor cores during the test with Turbo Boost enabled was 2.4 GHz:

274 GFLOPS / 230 GFLOPS (Performance at 2 GHz) * 2 GHz ≈ 2.4 GHz

So Turbo Boost is very effective even when all cores are fully loaded.

Note that both scales in the graph are logarithmic. It turned out that doubling the number of threads practically doubles the performance, which in all cases is close to the theoretical limits on large tasks. With small task sizes, performance is lower due to the larger share of exchange operations in the total volume of operations. nine0014

It is worth saying a few words about the use of Hyper-Threading technology in parallel calculations. This technology, implemented in Intel processors, allows two independent computation threads to be executed on one core. The operating system at the same time «sees» one physical core as two logical ones and reports that there are 32 CPUs on our workstation. However, if you use all 32 logical processors for parallel threads, performance will be lower because two Hyper-Threading threads share the resources of one physical core. However, you should not disable the Hyper-Threading mode in the BIOS, because, as tests have shown, this also reduces performance. Using environment variables, it is necessary to organize calculations so that one task thread falls on one physical core. nine0014

Thus, in a system with shared memory, which is our workstation in this test, the organization of parallel computing using the OpenMP standard allows you to get an almost linear performance increase with an increase in the number of parallel threads. So, for one thread on one core, we got 15 GFLOPS, and on 16 cores — 230 GFLOPS, that is, the loss was about 4%: 230 / (15 * 16) ≈ 0.96.

Unlike the OpenMP environment, when one program is launched, which is divided into parallel branches (threads) during execution, several copies of the program (processes) are launched in the MPI environment, which also work in parallel, but exchange data not through the program variables stored in shared memory, and through the message passing interface — MPI. Processes can be run both on one workstation and on several nodes united in a cluster. In the first case, the RAM of the workstation is used as the communication medium, and in the second case, the network interface is used. On the workstation, as in the previous example, we measured for 1, 2, 4, 8 and 16 processes, and on a cluster of 4 nodes — for 64 processes. nine0014

It turned out that for a workstation, performance in the MPI environment is almost the same as performance in the OpenMP environment. Therefore, if the performance of one workstation is sufficient, it is advisable to solve the problem of organizing parallel computing using OpenMP tools, since it is much easier to do this — you can parallelize the task automatically at the compilation stage. The use of MPI tools requires additional efforts to refine the program.

Cluster performance is highly dependent on task size. With a small task size, the cluster is inferior in performance even to a single node, since the exchange of data between processes greatly slows down calculations. As the task grows, the relative share of exchanges decreases and the total performance grows. The maximum performance for the cluster in our tests was 633 GFLOPS, which is 2.3 times faster than a single node. By the nature of the curve, it can be seen that with an increase in the size of the task, the performance will grow and, with a sufficiently large task size, it can be quite close to the peak theoretical performance. You can «shift» the curve up using a faster communication medium, with less latency than Gigabit Ethernet, for example, InfiniBand. nine0014

We tested the Intel Xeon Phi 3120A in Offload mode. In this mode, the task is launched on the workstation, after which the main computational part of the program is uploaded to the coprocessor and executed there. Computations occur with a large number of parallel threads. It is important that in the «Offload» mode, in contrast to the «Native» mode, the size of the task is not limited by the amount of RAM of the computing module (6 GB). We were able to test for a 64 GB task, but a 128 GB task still caused a coprocessor memory overflow. nine0014

The results of testing a workstation in the OpenMP environment, a cluster in the MPI environment, and a computing module in the «Offload» mode are shown in the following diagram.

At all task sizes, the compute module showed an overwhelming advantage and outperformed workstation performance by four times. Thus, we were able to make sure from our own experience that Intel has developed a truly revolutionary product that allows you to radically increase the performance of parallel computing without significant costs, since the cost of a computing module is less than half the cost of the workstation used in testing. nine0014

So what are the prerequisites for efficient computing using the Intel Xeon Phi coprocessor?

  1. Algorithms with a high «density» of regular calculations and a high degree of parallelism. According to Intel recommendations, the Xeon Phi coprocessor is suitable for tasks with a possible number of parallel branches of at least 100.
  2. Powerful workstation with Intel Xeon Phi Compute Module installed. For example, Team Workstation P4000CR.
  3. The operating system installed on the workstation must be compatible with the Intel Xeon Phi Compute Module. Intel is constantly working to expand the list of such operating systems.
  4. Compiler with instruction set support for MIC (Many Integrated Core) microarchitecture. For example, the Intel compiler.

We currently offer the following models of servers and workstations supporting Intel Xeon Phi:

9000 9000 5) server

9)0011

1) Workstation Team P4000cr -up to 4 Costocrassers Intel Xeon PHI
2) server

Team P4000ip -Up to 4 Coarsers INTEL XEON PHII

0 9)

TEAM P4000CO -up to 2 consumers Intel Xeon PHI
4) server Team R2000gz -Up to 2 coprocessers Intel Xeon Phi -Up to 2 consumers Intel Xeon Phi
6) server TEAM R2000LH -Up to 2 coprocessors Intel Xeon PHI
7)

TEAM P40011 Xeon Phi
8) Server Team R1000JP – 1 Intel Xeon Phi coprocessor

You can add the Intel Xeon Phi coprocessor to the server or workstation directly using the links provided.