Intel memory bandwidth: Memory Subsystem: Bandwidth — Sizing Up Servers: Intel’s Skylake-SP Xeon versus AMD’s EPYC 7000

Memory Subsystem: Bandwidth — Sizing Up Servers: Intel’s Skylake-SP Xeon versus AMD’s EPYC 7000

by Johan De Gelas & Ian Cutresson July 11, 2017 12:15 PM EST

Posted in
CPUs
AMD
Intel
Xeon
Enterprise
Skylake
Zen
Naples
Skylake-SP
EPYC

219 Comments
|

219 Comments

Tensions (And Chip Sizes) Are RisingAMD’s EPYC Server CPUAMD’s EPYC 7000-Series ProcessorsIntroducing Skylake-SPIntel’s New On-Chip Topology: A MeshIntel’s Optimized Turbo Profiles & Summing It UpXeon Skylake-SP SKUsIntel’s Turbo ModesIntel Expanding the Chipset: 10 GigE & QuickAssistPricing Comparison: AMD versus IntelTesting Notes & Benchmark ConfigurationMemory Subsystem: BandwidthMemory Subsystem: LatencySingle Threaded Integer Performance: SPEC CPU2006SMT Integer Performance With SPEC CPU2006Multi-core SPEC CPU2006Multi-Threaded Integer PerformanceDatabase Performance: MySQL Percona Server 5. 7.0 Java PerformanceBig Data benchmarkingFloating Point performanceEnergy ConsumptionClosing Thoughts

Measuring the full bandwidth potential with John McCalpin’s Stream bandwidth benchmark is getting increasingly difficult on the latest CPUs, as core and memory channel counts have continued to grow. We compiled the stream 5.10 source code with the Intel compiler (icc) for linux version 17, or GCC 5.4, both 64-bit. The following compiler switches were used on icc:

icc -fast -qopenmp -parallel (-AVX) -DSTREAM_ARRAY_SIZE=800000000

Notice that we had to increase the array significantly, to a data size of around 6 GB. We compiled one version with AVX and one without.

The results are expressed in gigabytes per second.

Meanwhile the following compiler switches were used on gcc:

-Ofast -fopenmp -static -DSTREAM_ARRAY_SIZE=800000000

Notice that the DDR4 DRAM in the EPYC system ran at 2400 GT/s (8 channels), while the Intel system ran its DRAM at 2666 GT/s (6 channels). So the dual socket AMD system should theoretically get 307 GB per second (2.4 GT/s* 8 bytes per channel x 8 channels x 2 sockets). The Intel system has access to 256 GB per second (2.66 GT/s* 8 bytes per channel x 6 channels x 2 sockets).

AMD told me they do not fully trust the results from the binaries compiled with ICC (and who can blame them?). Their own fully customized stream binary achieved 250 GB/s. Intel claims 199 GB/s for an AVX-512 optimized binary (Xeon E5-2699 v4: 128 GB/s with DDR-2400). Those kind of bandwidth numbers are only available to specially tuned AVX HPC binaries.

Our numbers are much more realistic, and show that given enough threads, the 8 channels of DDR4 give the AMD EPYC server a 25% to 45% bandwidth advantage. This is less relevant in most server applications, but a nice bonus in many sparse matrix HPC applications.

Maximum bandwidth is one thing, but that bandwidth must be available as soon as possible. To better understand the memory subsystem, we pinned the stream threads to different cores with numactl.

Pinned Memory Bandwidth (in MB/sec)

Mem
Hierarchy AMD «Naples»
EPYC 7601
DDR4-2400 Intel «Skylake-SP»
Xeon 8176
DDR4-2666 Intel «Broadwell-EP»
Xeon E5-2699v4
DDR4-2400

1 Thread 27490 12224 18555

2 Threads, same core
same socket 27663 14313 19043

2 Threads, different cores
same socket 29836 24462 37279

2 Threads, different socket 54997 24387 37333

4 threads on the first 4 cores
same socket 29201 47986 53983

8 threads on the first 8 cores
same socket 32703 77884 61450

8 threads on different dies
(core 0,4,8,12. ..)
same socket 98747 77880 61504

The new Skylake-SP offers mediocre bandwidth to a single thread: only 12 GB/s is available despite the use of fast DDR-4 2666. The Broadwell-EP delivers 50% more bandwidth with slower DDR4-2400. It is clear that Skylake-SP needs more threads to get the most of its available memory bandwidth.

Meanwhile a single thread on a Naples core can get 27,5 GB/s if necessary. This is very promissing, as this means that a single-threaded phase in an HPC application will get abundant bandwidth and run as fast as possible. But the total bandwidth that one whole quad core CCX can command is only 30 GB/s.

Overall, memory bandwidth on Intel’s Skylake-SP Xeon behaves more linearly than on AMD’s EPYC. All off the Xeon’s cores have access to all the memory channels, so bandwidth more directly increases with the number of threads.

Testing Notes & Benchmark Configuration
Memory Subsystem: Latency
Tensions (And Chip Sizes) Are RisingAMD’s EPYC Server CPUAMD’s EPYC 7000-Series ProcessorsIntroducing Skylake-SPIntel’s New On-Chip Topology: A MeshIntel’s Optimized Turbo Profiles & Summing It UpXeon Skylake-SP SKUsIntel’s Turbo ModesIntel Expanding the Chipset: 10 GigE & QuickAssistPricing Comparison: AMD versus IntelTesting Notes & Benchmark ConfigurationMemory Subsystem: BandwidthMemory Subsystem: LatencySingle Threaded Integer Performance: SPEC CPU2006SMT Integer Performance With SPEC CPU2006Multi-core SPEC CPU2006Multi-Threaded Integer PerformanceDatabase Performance: MySQL Percona Server 5. 7.0 Java PerformanceBig Data benchmarkingFloating Point performanceEnergy ConsumptionClosing Thoughts

Tweet

PRINT THIS ARTICLE

How High-Bandwidth Memory Will Break Performance Bottlenecks

Intel recently announced that High-Bandwidth Memory (HBM) will be available on select “Sapphire Rapids” Xeon SP processors and will provide the CPU backbone for the “Aurora” exascale supercomputer to be sited at Argonne National Laboratory.

Paired with Intel’s X^e HPC (codenamed “Ponte Vecchio”) compute GPUs running in a unified CPU/GPU memory environment, Aurora will deliver more than an exaflop/sec of double-precision performance. Realizing or exceeding an exaflop/sec performance metric using 64-bit data operands means programmers don’t have to take shortcuts or accept precision compromises by using reduced-precision arithmetic. It does mean that the memory system has to deliver data far more rapidly than previous generations of processors. Along with HBM for AI and data intensive applications, the Sapphire Rapids Xeon SPs also implement the Advanced Matrix Extensions (AMX), which leverages the 64-bit programming paradigm to speed tile operations and gives programmers the option of using matrix reduced-precision operations for convolutional neural networks and other applications.

Maintaining sufficient bandwidth to support 64-bit exascale supercomputing in an accelerated, unified memory computing environment is a significant achievement that is cause for serious excitement and raises expectations in both the enterprise and HPC communities. The unified memory environment means, as Argonne: “Programming techniques already in use on current systems will apply directly to Aurora.” By extension, institutional, enterprise and cloud datacenters will be able to design highly optimized systems using next generation Intel Xeon SPs for simulation, machine learning, and high performance data analytic workloads (or succinctly HPC-AI-HPDA) using applications written to run on existing systems.

Rick Stevens, associate laboratory director of computing for environment and life sciences at Argonne National Laboratory, codifies the significance of the achievement and need for HBM when he writes: “Achieving results at exascale requires the rapid access and processing of massive amounts of data. Integrating high-bandwidth memory into Intel Xeon Scalable processors will significantly boost Aurora’s memory bandwidth and enable us to leverage the power of artificial intelligence and data analytics to perform advanced simulations and 3D modeling.”

Why Is HBM Important

It has been known for a number of years that the ability of modern processors and GPUs to deliver flops has been rapidly outpacing the ability of memory systems to deliver bytes/sec. John McCalpin, the author of the well-known STREAM benchmark, noted in his SC16 invited talk Memory Bandwidth and System Balance in HPC Systems that peak flop/sec per socket was increasing by 50 percent to 60 percent per year while memory bandwidth has only been increasing by approximately 23 percent per year. He illustrated this trend with the following graph, where he charted the flops to memory bandwidth balance ratio of commercially successful systems with good memory performance relative to their competitors since 1990. Computer vendors are aware of the memory bandwidth problem and have been adding more memory channels and using faster memory DIMMs.

Comparison of memory bandwidth to floating-point capability for commercially successful platforms since 1990. (Source: John McCalpin https://sites.utexas.edu/jdm4372/2016/11/22/sc16-invited-talk-memory-bandwidth-and-system-balance-in-hpc-systems/)

HBM devices reflect an alternative approach that utilizes 3D manufacturing technology to create stacks of DRAM chips built on top of a wide bus interface. An HBM2e device, for example, connects the DRAM stack to the processor through a bus interface of 1,024 bits. This wide data interface and associated command and address requires that the DRAM be built on top of a silicon interposer that essentially “wires” up the approximately 1,700 lines required for the HBM read/write transactions. The silicon approach is necessary as it is impractical to create such a large number of lines using printed circuit board (PCB) technology.

Schematic of an HBM 2.5D Memory system using a single DRAM stack (Source: https://semiengineering.com/hbm-issues-in-ai-systems/)

The result is a huge jump in memory bandwidth and a significant savings in power over DDR memory systems. EEWeb notes that “a single HBM2e device consumes almost half the power as for a GDDR6 solution.” It concludes, “HBM2e gives you the same or higher bandwidth than GDDR6 and similar capacity, but power consumption is almost half, while TOPS/W are doubled.” The TOPS or Tera Operations Per Second is a measure of the maximum achievable throughput given the bandwidth of the memory device. It is used to evaluate the best throughput for the money for an application such as neural networks and data intensive AI applications.

The Past is Prelude to the Future — Memory Bandwidth Benchmarks Tell the Story

Benchmarks demonstrate the impact of memory bandwidth increases on HPC applications quite well. Intel recently published an apples-to-apples comparison between a dual-socket Intel Xeon-AP system containing two Intel “Cascade Lake” Xeon SP-9282 Platinum and a dual-socket AMD “Rome” 7742 system. As can be seen below, the Intel twelve memory channels per socket (so 24 channels in the two-socket configuration) Intel Xeon SP-9200 series system outperformed the AMD eight memory channel per socket (sixteen total with two sockets) system by a geomean of 29 percent on a broad range of real-world HPC workloads.

Impact of twelve memory channels versus eight memory channels on a variety of HPC benchmarks (Source: Only memory bound results reported in https://www.datasciencecentral.com/profiles/blogs/cpu-vendors-compete-over-memory-bandwidth-to-achieve-leadership)

The reason is that these benchmarks are dominated by memory bandwidth while others are compute-bound as shown below:

Sensitivities of various HPC workloads to memory and compute limitations (Source: https://medium. com/performance-at-intel/hpc-leadership-where-it-matters-real-world-performance-b16c47b11a01)

oneAPI Heterogeneous Programming Enables Next Gen Capabilities

The compute versus memory bandwidth bottleneck dichotomy illustrated in the chart above highlights how the combined efforts of the oneAPI initiative can help solve a multitude of compute and memory bottlenecks at the same time in an environment using a combination of CPUs, GPUs, and other accelerators. Succinctly, high memory bandwidth is fundamental to keeping multiple devices in a system and the per-core computational units supplied with data. Once there is sufficient bandwidth to prevent data starvation, then programmers can get to work to overcome the compute bottlenecks by making changes to the software.

The oneAPI heterogeneous programming approach helps enable these purpose-built, cutting-edge capabilities.

HBM memory: Very simply, high computational performance cannot be achieved when the compute cores and vector units are starved for data. As the name implies, and as presented in this article, HBM delivers high memory bandwidth.

Unified Memory Environment: A unified memory space gives both CPUs and accelerators such as the Intel X^e compute GPU the ability to access data in a straightforward manner. This means users can add the Intel GPU based on X^e architecture or based on X^e HPC microarchitecture speed compute-bound problems that are beyond the capabilities of the CPU cores. The additional bandwidth of the HBM memory system helps keep multiple devices busy and supplied with data.

Intel AMX instructions: Intel added the AMX instructions to speed SIMD processing of some heavily utilized compute-bound operations in AI and certain other workloads. Core to the AMX extensions is a new matrix register file with eight-rank, two-tensor (matrix) registers — referred to as tiles. The programmer is able to configure the number of rows and bytes per row in the tile through a tile control register (TILECFG). This gives programmers the ability to adapt the characteristics of the tile to more naturally represent the algorithm and computation. The Sapphire Rapids Xeon SPs support the full AMX specification including AMX-TILE, AMX-INT8, and AMX-BF16 operations.

oneAPI Cross-architecture Programming: oneAPI’s open, unified, cross-architecture programming model lets users run a single software abstraction on heterogeneous hardware platforms that contain CPUs, GPUs, and other accelerators across multiple vendors. Central to oneAPI is the Data Parallel C++ (DPC++) project that brings Khronos SYCL to LLVM to support data parallelism and heterogeneous programming within a single source code application. SYCL is a royalty-free, cross-platform abstraction layer built entirely on top of ISO C++, which eliminates concerns about applications being locked in to proprietary systems and software. DPC++ enables code reuse across different hardware targets such as CPU, GPUs, and FPGAs individually or orchestrating all the devices in a system can into a powerful combined heterogeneous compute engine that can perform computations concurrently on the varied system devices. A growing list of companies, universities, and institutions are reporting the benefits of oneAPI and its growing software ecosystem.

Looking To The Future

Of course, everyone wants to know how much memory bandwidth the new Intel Xeon Scalable HBM memory system will provide. This information still remains to-be-announced. According to Mark Kachmarek, who is Xeon SP HBM product manager at Intel: “The new high-bandwidth memory system for Intel Xeon processors will provide greater bandwidth and capacity than was available on the Intel Xeon Phi product family.” This provides a lower bound, which is exciting.

The real bandwidth of the Sapphire Rapids HBM memory system will be defined by the number of memory channels and performance of the HBM devices on each channel. Current HBM2 devices deliver between 256 GB/sec to 410 GB/sec, which gives us an idea of the performance potential of a modern HBM2 stacked memory channel. The number of memory channels supported by the HBM-enabled Sapphire Rapids Xeon SPs has not yet been announced.

Rob Farber is a global technology consultant and author with an extensive background in HPC and machine learning technology development that he applies at national labs and commercial organizations. Rob can be reached at [email protected].

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Intel Core i5-8250U — 104 secret facts, review, specifications, reviews.

Top specifications and features

PassMark CPU score

Cinebench score21.5 (single)

Heat dissipation (TDP)

Processor RAM

Technological process

PassMark CPU score

Intel Core i5-8250U:
5764
Best score:
89379

Test results

Intel Core i5-8250U:
48009
Best score:

Technology

Intel Core i5-8250U:
3000
Best score:

Performance

Intel Core i5-8250U:
4271
Best score:

Memory Specification

Intel Core i5-8250U:
2030
Best score:

Description

Intel Core i5-8250U processor running at 1. 6 Hz. At the same time, the maximum frequency in Boost mode reaches 3.4 Hz. 4 cores available. The L1 cache is 256 KB, L2 1 MB and L3 8 MB. Power consumption at peak times can reach 15 watts.

The maximum number of threads that Intel Core i5-8250U can work with is 8.

Intel Core i5-8250U works on 14 nm architecture.

Regarding memory specification. The Intel Core i5-8250U processor supports DDR4. The frequency of RAM is 2400. The maximum supported volume is 32 MB. The maximum memory bandwidth is 37.5. Number of memory channels supported 2.

If we talk about the integrated graphics of Intel Core i5-8250U, then the Intel UHD 620 core is installed here. The base frequency of the graphics system is 300 MHz. And the maximum frequency of the graphics system can reach 1.1 MHz.

Now about the Intel Core i5-8250U tests. According to PassMark, the processor scored 5764 of the possible points. Based on the analysis of more than 4000 processors, the Intel Core i5-8250U ranked 638 in the top ranking.

Why the Intel Core i5-8250U is better than others

Cinebench21.5 score (single) 6 . This parameter is higher than that of 9%

Thermal Dissipation (TDP) 15 W. This setting is lower than 91%

Technological process 14 nm. This parameter is lower than 89%

PassMark CPU score 5764 . This parameter is lower than 18%

Processor RAM 32 GB. This parameter is lower than 7%

Geekbench 5 (Multi-Core) 2741.04 . This parameter is lower than that of 3%

Geekbench score 5,758.16 . This parameter is lower than 3%

Number of cores 4 . This parameter is lower than that of 55%

Overview Intel Core i5-8250U

Test results

Technology

Performance

Memory specification

Interfaces and communications

Main characteristics

Intel Core i5-8250U Review: Highlights

PassMark CPU score

The PassMark benchmark considers read speed, write speed, and seek time when testing SSD performance.
Show all

5764

max 89379

Average: 6033.5

89379

Geekbench 5 (Multi-Core)

2741.04

max 23628.202

Average: 5219.2

23628.202

Geekbench score 5

758.16

max 1600.56

Average: 936.8

1600.56

Cinebench21.5 score (single)

A test that determines processor performance using a thread of execution.

6

max 51

Average: 5.6

51

3DMark06 test score

6043

max 18628

Average: 3892. 6

18628

Test score Cinebench R11.5 /64bit (Multi-Core)

5.7348

max 45.3622

Average: 5.3

45.3622

Cinebench R15 test score (Multi-Core)

544

max 4614

Average: 638.4

4614

Cinebench R15 test score (Single-Core)

137

max 276

Average: 128.5

276

AES

Yes

Support for Intel Optane 9 memory0022

Yes

Thermal Control Technologies

Yes

Intel Privacy Protection Technology

Yes

Function Execute override bit

Yes

Intel Trusted Execution Technology

No

Number of threads

The more threads, the higher the performance of the processor, and it will be able to perform several tasks at the same time.
Show all

eight

max 256

Average: 10.7

256

L1 cache size

Large amount of L1 memory accelerates results in CPU and system performance settings
Show all

256KB

max 4608

Average: 299.3 KB

4608KB

L2 Cache Size

L2 cache with large scratchpad memory to increase processor speed and overall system performance.
Show all

1MB

max 512

Average: 4.5 MB

512MB

L3 cache size

Large amount of L3 memory accelerates results in CPU and system performance settings
Show all

6MB

max 768

Average: 16. 3 MB

768MB

Maximum Turbo Clock Speed 

When the processor’s speed drops below its limit, it can jump to a higher clock speed to improve performance.
Show all

3.4GHz

max 5.5

Average: 3.2 GHz

5.5GHz

Number of cores

four

max 72

Mean: 5.8

72

Processor base clock speed

1.6GHz

max 4.7

Average: 2.5 GHz

4.7GHz

Frequency with Intel Turbo Boost Technology 2.0

3.4GHz

max 5.1

Average: 3.5 GHz

5. 1GHz

Max. number of PCI Express lanes

12

max 64

Average: 22.7

64

PCI Express

1×4 configurations. 2×2. 1×2+2×1 and 4×1

Idle states

Yes

Turbo Boost technology

Turbo Boost is a technology that allows the processor to operate at a frequency above the maximum. This increases its productivity (including when performing complex tasks)
Show all

2

Mean: 1.9

2

Graphics

Intel UHD 620

Max. graphics system frequency

1.1GHz

max 1.55

Average: 1.1 GHz

1. 55GHz

Number of PCI-Express lanes

12

Max. number of processors in configuration

one

Mean: 1.3

8

DDR version

four

Mean: 3.5

5

Max. memory bandwidth

This is the speed at which the device stores or reads information.

37.5GB/s

max 352

Average: 41.4 GB/s

352GB/s

Memory frequency

The RAM can be faster to improve system performance.
Show all

2400MHz

max 4800

Average: 2106. 2 MHz

4800MHz

Max. number of memory channels

The greater the number, the higher the data transfer rate from memory to processor

2

max 16

Mean: 2.9

16

Max. memory size

The largest amount of RAM memory.

32GB

max 6000

Average: 404.4 GB

6000GB

System bus frequency

Data between computer components and other devices is transferred via the bus.
Show all

4 GT/s

max 1600

Average: 156.1 GT/s

1600 GT/s

Memory support ECC

Memory debugging code is used when it is necessary to avoid data corruption during scientific computing or server startup. It finds possible errors and repairs data corruption.
Show all

No

Processor RAM

32GB

max 128

Average: 34.8 GB

128GB

Max. permission (DP)

[email protected]

vPro

No

Enhanced SpeedStep (EIST)

Yes

OpenCL

4.4

max 4.6

Average: 4.1

4.6

Intel® AES-NI Commands

AES is required to speed up encryption and decryption.

Yes

Hyper-Threading Technology

Many Intel processors use state-of-the-art hyper-threading technology. Thus, each processor core works simultaneously on two threads, which significantly increases performance. Most processors work on the principle: one thread per core, therefore, their performance is lower.
Show all

Yes

OpenGL

Later versions provide quality game graphics

4.4

max 4.6

Mean: 4.4

4.6

AVX

AVX allows you to increase the speed of calculations in multimedia, financial and scientific applications, it also improves the performance of Linux RAID.
Show all

Yes

Version sse

Allows you to speed up multimedia tasks (such as adjusting the volume of the sound). Each subsequent version has a number of improvements
Show all

4.2

max 4.2

Average: 4.1

4. 2

Support 4K

You can enjoy the highest quality images

Yes

Socket

FC-BGA1356

My WiFi

Yes

Speed Shift

Yes

Thermal Monitoring

Yes

Flex Memory Access

Yes

SIPP

No

Smart Response

Yes

TSX

No

TXT

No

EDB

Yes

Secure Key

Yes

Identity Protection

Yes

SGX

Yes

OS Guard

Yes

VT-d

Yes

VT-x

Yes

EPT

Yes

AMD Virtualization Technology

Yes

Quick Sync Video

Yes

Clear Video

Yes

Clear Video HD

Yes

eDP

Yes

DisplayPort

Yes

HDMI

Yes

DVI

Yes

Process technology

The small size of the semiconductor means it is a new generation chip.

14 nm

Average: 36.8 nm

5 nm

Heat Dissipation (TDP)

The Heat Dissipation Requirements (TDP) is the maximum amount of energy that can be dissipated by the cooling system. The lower the TDP, the less power will be consumed.
Show all

15W

Average: 67.6 W

0.025W

PCI Express Revision

3

Mean: 2.9

5

Status

Launched

Release date

07/01/2017

Embedded options available

No

Case size

42mm X 24mm

Device ID

0x5917

GPU base clock

The graphics processing unit (GPU) has a high clock speed.

300MHz

max 2400

Average: 535.8 MHz

2400 MHz

Supports 64-bit system

A 64-bit system, unlike a 32-bit system, can support more than 4 GB of RAM. This increases productivity. It also allows you to run 64-bit applications.
Show all

Yes

DirectX

Used in demanding games, providing enhanced graphics

12

max 12.1

Average: 12

12.1

Maximum processor temperature

If the maximum temperature at which the processor operates is exceeded, a reset may occur.
Show all

100°C

max 110

Average: 96°C

110°C

OpenGL

Later versions provide quality game graphics

4. 4

max 4.6

Mean: 4.4

4.6

Turbo GPU

If the speed of the GPU drops below its limit, then to improve performance, it can go to a high clock speed.
Show all

1100MHz

max 2100

Average: 1091 MHz

2100MHz

Monitor support

Multiple monitors can be connected to the device, which makes it easier to work by increasing the working space.
Show all

3

Mean: 2.9

4

Codename

Kaby Lake R

Destination

Mobile

FAQ

Can Intel Core i5-8250U work in 4K mode

Yes.

How many PCIe lanes

12.

How much RAM does the Intel Core i5-8250U support?

Intel Core i5-8250U supports 32 GB.

How fast is the Intel Core i5-8250U 5600X?

The processor runs at 1.6 GHz.

How many cores does the Intel Core i5-8250U have?

4 cores.

Does the Intel Core i5-8250U support ECC memory?

No.

Does the Intel Core i5-8250U have integrated graphics?

Intel UHD 620

Which RAM does the Intel Core i5-8250U support

The Intel Core i5-8250U supports DDR4.

What is the socket of Intel Core i5-8250U

FC-BGA1356 is used to install Intel Core i5-8250U.

Is the Intel Core i5-8250U a 64-bit processor

Yes

What architecture does the Intel Core i5-8250U use?

The Intel Core i5-8250U is based on the Kaby Lake R architecture.

What frequency does the Intel Core i5-8250U processor run at?

Intel Core i5-8250U processor running at 1.6 Hz.

What is the maximum frequency of the Intel Core i5-8250U processor?

In this case, the maximum frequency in Boost mode reaches 3. 4 Hz.

How much cache is the Intel Core i5-8250U?

L1 cache is 256 KB, L2 1 MB and L3 8 MB.

How many watts does the Intel Core i5-8250U consume?

Power consumption at peak times can be up to 15 watts.

leave your feedback

Test Intel Core i5-12600K with DDR4 and DDR5 — i2HARD

Evgeny Serov

November 28, 2021

Comparison with R5 5600X and i5-11600K stock and overclocked on Windows 10 and Windows 11

Good day to everyone, i2hard is in touch. Confidently and gradually, we are exploring a new generation of Intel processors. Next up is the i5-12600K.

The processor promises to be very interesting, because compared to the previous generation, we got a new microarchitecture with an increased IPC, an increased amount of cache in the second and third levels, support for DDR5, as well as 4 controversial energy-efficient cores. And all this is seasoned with a new 10-nanometer process technology.

Test bench

So let’s not pull the cat by the tail and proceed to the study. There is a lot of information, but it is interesting, so do not flip through.

Everything is known in comparison, so you can’t do without rivals. The representative of the previous generation of the i5 series is the 11600K, and on the AMD side, the six-core Ryzen 5 5600X. A set of DDR4 memory with a total volume of 32 GB is built on dual-rank Samsung B-die chips, but for now DDR5 has to be dispensed with as it is. The Adata XPG Lancer kit contains two 16-gigabyte modules with a peer-to-peer chip layout manufactured by Micron. Since we have 2 types of memory, there are also two motherboards for 12600K. ASUS ROG Maximus Z69 has DDR5 slots0 Hero, and DDR4 ASUS TUF Gaming Z690-PLUS WIFI D4.

We have already talked about the first one, we will devote a little time to TUF. External features can be seen with the naked eye. Mounts for the cooling system are compatible with the past. The main thing to consider is that the processor is located below and the socket mount is screwed with four screws, not three.

Compared to the namesake on the Z590 chipset, the number of M.2 connectors has increased to four pieces, and there are now four sat ports, not six.

The rear panel lost PS / 2, but acquired an additional Type-C connector.

VRM also received an upgrade. Now 14 80-amp DrMOS SiC659 assemblies are responsible for powering the processor cores, despite the fact that the new processors are less voracious.

Full configuration:

Graphics Card #1: Palit GeForce RTX 3080 Ti GameRock OC

Graphics Card #2: MSI GeForce GTX 1050 2G OC

Processor #1: Intel Core i5-11600K

Processor #2: Intel Core i5-12600K

Processor #3: AMD Ryzen 5 5600X

Motherboard #1: ASUS ROG Maximus XIII Hero

Motherboard #2: ASUS TUF Gaming Z690-plus WIFI D4

Motherboard #3: ASUS ROG Maximus Z690 Hero

Motherboard #4: ASRock B550 Taichi Razer Edition

DDR4 RAM: G. SKILL Trident Z F4-3200C14D-32GTZ 2×16 GB

DDR5 RAM: A-Data XPG LANCER AX5U5200C3816G-DCLABK 2×16 GB

Cooling System #1: GamerStorm Castle 360RGB v2

Cooling System #2: DeepCool AK620

Drive: Crucial MX500 2TB

Power supply: Deepcool DQ850-M-V2L

Body: Open Stand

Stock tests

Synthetic tests

We start with synthetics. Aida tells us about the greatly increased memory latency when using DDR4 compared to the previous generation, it is almost the same as that of the Ryzen. The bandwidth of DDR5 pleases, but for the dual-channel memory mode inside one module, you had to pay with even higher latency. The low speeds of the caches of the first and second levels are also striking, but this is a feature of the calculation of Aida itself, which mixes the speeds of the caches of large and small cores. The frequency of the ring bus is also not encouraging, although it is dynamic, it is clearly lower than that of the i5 of the previous generation.

However, in benchmarks and tasks that do not depend on memory speed, the new i5 shows a huge lead. The performance of large cores compared to the previous generation has increased by 21% in CPU-Z, and when using all the capacities, the 12600K is one and a half times ahead of six-core rivals.

In Cinebench R23, we have a similar increase, even a little more in the multithread test.

Geekbench 5 is memory dependent, so high-frequency DDR5 allows 12600K to score 10% more points in the multi-thread test than when combined with DDR4.

The processor rendering in Premiere Pro also responds very well to high memory bandwidth, and much better than to the increase in cores and their performance. DDR5 renders 25% faster than DDR4.

Consumption and temperatures

And since the new i5 is so good at synthetics, we need to talk about cooling.

The last generation of Intel proved to be a serious test for small-area cooling systems, as heat was removed from a large die easily, but there was a lot of it.

The new processors have switched to a thinner process technology, which also causes some concern, because we all remember how hot the 3000 series ryzens came out due to the transition to the 7nm process technology. After all, it is more difficult to remove heat from a smaller area of \u200b\u200bthe crystal. Add to this the increased power limits, for which motherboard manufacturers are now responsible. And they focus on the processor power system, not your cooling system.

But along with the reduction of the crystal, the area of \u200b\u200bthe processor cover increased, and the thickness of silicon and solder also decreased.

As a result, new processors generate less heat, but it is still more difficult to remove. Keep in mind that depending on the sample, these indicators will also change, so they are very variable. At the same time, Ryzen, with its 75 watt PBO limit, heats up almost the same. In any case, the low TDP of 12600K will allow you to easily use ordinary tower coolers with four heat pipes in stock.

Game tests

We played with synthetics, now it’s time to play games.

Call of Duty: Warzone, graphics settings — eSports, DLSS — performance. And .. something is not impressive with the new i5. There are more cores, more cache, higher IPC, and almost no gain compared to the previous generation. Even with DDR5 it’s not impressive — 8% higher average FPS. An online match, of course, is not the pinnacle of accuracy, here the error is not small, but so far the result is not impressive.

Cyberpunk, graphics preset — ultra ray tracing, DLSS — ultra performance, crowd density to the maximum.
i5s with DDR4 are equal. Almost no difference. Moreover, this game parallels even better than Warzone, judging by the core loading schedule. At the same time, DDR5 in its infancy gives a 17% boost in this game compared to the modest XMP DDR4. Ryzens have never been good at cyberpunk, so the 5600X falls behind even the 11600K.

Far Cry 6, ultra graphics preset, tracing enabled, FSR: performance. In this game, let’s not talk about FPS, but about the behavior of the 12th generation on the 10th Windows. As you can see, the 12600K has a lot of stutters. If you follow our channel, you know from the i7-12700K test that the stutters stopped after the game update. Those tests were done by Vitaly. These Dmitryaga. All drivers are installed in Dmitryagi, updates too. Switching to Windows 11 or disabling small cores helped him get rid of stutters. At the same time, he completely reinstalled windows, the game, even deployed the image of the Vitalina system — there are still stutters. And that describes the whole situation. You can’t know when or where this will happen, but such a setup will happen, so be prepared to upgrade to a new version of Windows with a newer version of thread director software support. The same module built into the processor that is responsible for distributing the load among the cores.

Shadow of Lara. Highest graphics preset, DLSS — performance. It’s interesting to see how the 12600K’s use of hyper-threaded cores consistently pecks as the load increases with the appearance of the market on the screen. In theory, it’s even true that the load on small cores drops first, because they are still more productive than a virtual thread, and they have their own cache. But once again, almost identical FPS on the i5s of different generations strains. Naturally, the results are rechecked, no Gear 2 turned on by itself, and on Windows 11 the results are not particularly different, with the exception of Far Cry. The investigation led to a strange feature of the 12th generation processors. They experience a wild hunger for memory. Simulating an XMP 3600 MHz CL16 instead of our 3200 CL14 gave a 10% boost in games out of the blue. This is a lot. Yes, of course, more cores and their higher performance leads to this, but the i7-11700K was not slower than the 11600K, so the 12600K should not be so limited in memory, but it is. We look forward to acceleration.

Watch Dogs Legion, ultra graphics preset, DLSS performance. Like all previous games, the i5-12600K with the initial (in terms of frequency) XMP DDR4 memory is practically indistinguishable from 11600K. At the same time, similarly, the initial (in terms of frequency) DDR5 gives it an additional 14% FPS. The 5600X keeps up with the i5s on DDR4.

StarCraft II, all settings at maximum. What does Starcraft love? 2 fast cores and fast data access. High bandwidth and 10 cores are useless to him. The 12600K has the first — high IPC. Latency, as you remember, is noticeably higher than that of 11600K, especially with DDR5, so the situation here is slightly different. In combination with it, the FPS is the same as at 11600K, but with DDR4, the processor prepares 10% more frames.

Troy. Graphics preset — ultra, grass and squad size — extreme, the resolution is lowered by the modifier. That would be the case everywhere. All streams are loaded to capacity. A huge gap from the previous generation, as well as from Ryzen. A third higher FPS for fresh i5. The high memory bandwidth of DDR5 only played into the hands of the loaded cores. It gives another 16% increase, which leads to one and a half times dominance over rivals. Just like in synthetic tests, only this is a game.

Test results in stock

On average in the ward, it turns out that the new i5 provides 7% more FPS compared to the previous one within DDR4 memory.

But you know what? Troy is, of course, good, but if we do not take it into account, then this is what actually comes out. Just 3% on average 12600K faster than the previous i5. Catastrophically small difference with such architectural changes.

But if you did not rewind and read carefully, then remember that abnormal memory hunger is to blame. From which we conclude: if you need a new processor only for games and you are not going to overclock it, then look for memory modules with a higher frequency XMP. With a frequency of 3600 MHz and Gear 1, there should be no problems, the limits have shifted, so do not choke the processor with slow memory. This phrase has always been relevant, but now more than ever.

Acceleration

Finally, let’s move on to overclocking.

i5-12600K and competitor setup

The 5600X took 4700 MHz on the cores with a voltage of about 1.325 V under load, the memory met its limit at 3800 MHz as a standard with the first timing of 14. 12600K due to the high area of \u200b\u200bthe crystal allows you to set the voltage higher, but even so only 4900 MHz with a voltage of 1.43V under load. The memory raced up to 3733 MHz with the first timing of 14 in Gear 1 mode. 12600K, let’s say right away, is not the best, but still the big cores conquered 5 GHz with a voltage of 1.325 V under load. Power efficient cores took only 3.9GHz, and the ring bus is 4.1 GHz. The memory controller is also not the best. For 3900 MHz in Gear 1 mode, it already required high voltages on SA. DDR5 from micron, as you know, is now the first from the end among the options currently available from three manufacturers. And it often runs no higher than 5600 MHz, and more often 200 MHz less. So we got 5400 MHz with the first timing of 34. So, overclocking the memory to 12600K clearly benefited. In both cases, the latency dropped by 10 ns, which allowed us to slightly reduce the gap from 11600K. The bandwidth is also pleasing, we are gradually returning what was once lost.

Since the new i5 now has a frequency of 100 MHz not lower, but higher than the old one, the gap in synthetics, which does not depend on the memory speed, naturally slightly increased.

Synthetic overclocking tests

Both in CPU-Z and Cinebench, almost 20,000 points in multithreading, which is even more than the i9-10900K. By the way, would you be interested to see their separate comparison?

DDR5 positions have dipped in Geekbench, but it is still in the lead. And this is despite the instructions disabled due to the energy-efficient AVX-512 cores, which this benchmark loves so much.

In Premiere Pro, DDR4 has also closed the gap with DDR5, the latter allows you to render a project not 25%, but 12% faster. Wow, there would be memory on chips from Hynix.

Power consumption and temperatures

Let’s compare power consumption and temperature of processors again. Start with equal voltage. Surprisingly, in such conditions, the power consumption of the i5s caught up. From the outlet, both systems also ate about 273 watts. But remember that the 11600K has a slightly lower frequency, and the 12600K has 4 more small cores working at full capacity. At the same time, with 20% lower power consumption, Ryzen came out the hottest.

We take into account that when overclocking Intel processors, many people are guided not by the processor voltage, but by its temperature. After increasing the Vcore by 100 MW, the processor took 200 MHz more cores and the ring bus, although along the way we also picked up an additional 40 watts of heat dissipation, which are removed by the liquid cooling system with approximately the same efficiency as 160 watts from 12600K.

But all this was tested on, though not custom, but dropsy. What happens if a person has an air cooler? We check. Let’s take the DeepCool AK620 tower as an example. Two sections, 120mm fans, G.Ckills do not overlap and are only 160mm tall.

An interesting situation emerges. On Ryzen, the maximum temperature increased by 5°C, and the average temperature by 3.5°C. For 11600K — by 4 and 3°C, respectively. And for 12600K only 1 and 2°C. Checked several times. It really makes little difference.

However, keep in mind that these are measurements on an open stand. In the case, the CBO will increase internal flows, plus hot air from a powerful video card will not flow directly into the cooler blades.

Overclocked game tests

The most interesting. Overclocking games.

In Warzone, 12600K is now clearly ahead. Overclocking cores, cache and DDR4 memory increased FPS by 50! This is almost 30%. A very decent increase. For comparison with DDR5, complex overclocking gave 21% additional frames. The 11600K sped up 19% and the 5600X just 9.5%.

Cyberpunk is also not deprived of an increase from overclocking. Here, the 12600K combined with DDR4 is 31% faster. But even this did not help to overtake the version with DDR5. It’s nice to see that even with its weakest representative, at least somewhere there is an advantage. The rest of the processors also received an increase from overclocking higher than in Warzone, so the i5s on DDR4 differ by 10% instead of 12. Not to say that much.

In order to remind once again that Windows 10 has problems with new processors, we did not add test footage for Windows 11. These moments really occur in everyday life: either TestMem5 will hang only on energy-efficient cores after passing the first cycle, then Premiere. Cinebench R15, by the way, also spins only on them. Banal 7 Zip archiving can cause a similar problem. We do not dissuade you, but we warn you. Not everything goes as smoothly as we would like.

In Lara, from overclocking, the average FPS flew up by almost 60 units, that is, by a third. Absolutely different level of productivity, agree. But the average in Lara is not very demonstrative. At the beginning of the scene, processors with large cache fill the pockets of average FPS, but when it comes to the market, the situation changes. Ryzen is 30 FPS behind the 12600K at this point, as is the 11600K. DDR5, although not much, is also ahead in contrast to the average FPS.

In Dogs, similarly, performance increased by 30%, while on DDR5 and other processors, the increase was about 15%. Do you understand, yes, how he was strangled with XMP 3200 MHz? However, once again DDR5 is a bit ahead. And the more resource-intensive the game or task, the greater the benefits of high-frequency memory. We found this out when comparing different modes of the memory controller divider.

In Starcraft, the situation is reversed again. After overclocking, the 12600K lost its advantage and its lead was reduced from 9up to 5%. However, with DDR4 it is in first place, but with DDR5 it is in last place.

Surprisingly, DDR5 dominance in Troy is not as high as expected — only 4%. And if the 12600K with DDR4 was a third faster than the stock 11600K, now this lead has increased to 48%. An increase of one and a half times. Imagine if it was everywhere. It’s mind blowing!

Overclocking results

So. On average, we already have a much more favorable situation in the ward. The new i5 is 17% faster than the previous one on average.

But let’s take Troy off the list again.

And now 12%. Not so much, if you remember the Ryzens, who for the last two generations received a 20% increase in games, or even more.

Of course, the platform is fresh, not everything is perfect, but blindly believing in thread director software optimizations or something like that is not worth it. Suddenly will not come true? Unless a more productive DDR5 will give a greater lead.

Integrated graphics tests

The increase in processor performance is, of course, good. But did you know that the integrated graphics have also undergone a change?

You can’t tell by the name that it’s very different. It was UHD 750, it became UHD 770. The number of blocks and cores is the same, flops are 11% more, but this corresponds to the increased frequency.

So what has changed? Microarchitecture. UHD 750 was gen 12.1, this one is 12.2. Doesn’t sound very good either, does it? But it’s worth checking.

CS:GO, settings at minimum, 1080p. Surprisingly, compared to the last test three months ago, the UHD 750 managed to drop the statistics of rare and very events. But the UHD 770 is making progress. What? Same FPS? You are not looking there. Brightness! She is normal! On previous generations of integrated graphics, it was always very low, on some cards you even gouge out your eyes, but now it works as it should. Is this not progress?

But before the GTX 1050 or GT 1030 even before the moon, you know.

Resident Evil Village, minimum graphics settings, 1080p, FSR — performance. And there is already an impressive difference. The new graphics are a third faster. It’s not clear why. Either the drivers are better, or the new microarchitecture interacts better with directx 12.

Recall that at FPS below 30 the game slows down, so the footage is out of sync, and the GTX 1050 has low FPS due to lack of video memory. Surprisingly, the advantage of DDR5 when using integrated graphics is small.

Overclocking is also interesting on the new build. Look, the UHD 750 at 11600K raced from 1300 to 1600 MHz. +300 MHz is a 23% higher frequency. Guess how far the 770 is running?
To the maximum! The highest multiplier is 42. That is, with a bus frequency of 50 MHz, we have 2100 MHz. With the BCLK twist, the limit was met at 2200 MHz, but they did not bother with this. Do you understand the potential here? Almost 1.5 times the frequency is higher. Let’s go check the growth.

In Aida GPGPU Benchmark, the flops increased as well as the frequency. By 23 and 44% respectively. For unknown reasons, read and write speeds are lower with DDR5.

In Contra, overclocking integrated graphics and memory gave a 33% increase for the UHD 750, 44% for the 770th on DDR4 and 41% with DDR5. 1050 accelerated by 11% from overclocking. Not quite what we expected, but still a very good result. If only the frametime was more even and it would be the most.

We voice the increase from overclocking the built-ins in the Resident from left to right. 38, 48 and 44%. The 1050 did not change much in FPS, as it spends a lot of time communicating with RAM via the PCI Express bus, which is why it outperformed the UHD 770.

Surprisingly, the new installation has received a good increase. Of course, it is not very correct to evaluate by two games, but nevertheless.