Difference Between CPU Cores vs. Threads
Whether you are a software developer or simply someone that relies on technology on a daily basis, getting the most out of your business investments is a priority. From infrastructure to hardware to software, every dollar that goes out should provide a demonstrable benefit and return on investment. If it does not, you may be moving in the wrong direction.
One place often overlooked when choosing technology? The right CPU for your needs. Perhaps decisions are made without an understanding of the role of the CPU. Maybe this detail seems an afterthought in the grand scheme of things.
Whatever the case, ignoring the utility and cost of the right CPU for your needs is a missed opportunity to maximize investments and outcomes in technology.
Why It Pays to Understand CPU Cores and Threads
Understanding the difference between cores and threads can help you make informed decisions about how to maximize performance. Let’s start with a few key concepts:
- Cores are physical processing units.
- Threads are virtual sequences of instructions given to a CPU.
- Multithreading allows for better utilization of available system resources by dividing tasks into separate threads and running them in parallel.
- Hyperthreading further increases performance by allowing processors to execute two threads concurrently.
With those four details in mind, examine the difference between CPU cores and threads so the next time you’re investing in infrastructure or hardware, you make the right decision for your business.
What Is a CPU?
A CPU (central processing unit) is essentially the brain that processes and carries out instructions. CPUs come in many different varieties, such as single-core, dual-core, quad-core, and multi-core processors. The more cores a processor has, the faster it can carry out tasks. This is often comparable to how a GPU works.
In addition to executing instructions from programs, the CPU also manages other components of the system like RAM (random access memory), HDD (hard disk drive), or SSD (solid-state drive). The CPU is responsible for coordinating and communicating with the other components. It’s important to choose the right CPU, depending on what types of tasks you plan to do.
Your CPU is likely to be different if you’re running applications and workflows vs storing archives and legacy files. CPUs vary widely in performance, power consumption, and cost. The activities the CPU performs will materially impact the right choice for your business needs and budget.
Understanding CPU Cores
The number of cores in a system will determine how many programs and tasks it can execute at once. For instance, a single-core processor may be able to handle one task at a time. By contrast, a quad-core processor could handle up to four simultaneous tasks. As the number of cores increases, so do processing speed and throughput.
Single-core CPUs are cheaper than multi-core CPUs, and they consume less power. This makes them great options for laptops, tablets, and other mobile devices. They also work well if the tasks you need to complete are relatively simple or don’t require too much multitasking. On the other hand, they will lack the performance of a multi-core CPU.
A multi-core CPU is ideal for multitasking and running applications that require high levels of performance or processing large datasets. This type of processor can divide tasks among the cores, allowing each to handle its own piece. A multi-core CPU will require more energy and supporting hardware to support its power.
The Difference Between a Core and a CPU
So what exactly is the difference between a CPU and its cores? Well, it’s easy to think of the cores as the wheels of an automobile and the CPU as the vehicle. The wheels are essential to moving the vehicle, and this machine will need more wheels for added power and stability.
A CPU refers to the whole computer chip, while the number of cores present on a single CPU can vary. If you have ever bought a personal computer, you may have seen descriptions that include dual-core or quad-core, referring to two or four processing cores, respectively.
Understanding CPU Threads
A thread is a sequence of instructions given to the CPU by a program or application. The more threads a CPU can execute at once, the more tasks it can complete.
Threading in a CPU is a technique that can increase the speed and efficiency of multitasking. It enables multiple threads of execution to run simultaneously on one or more cores in a single processor, allowing for quicker response times and more efficient use of resources.
Threading is used in many different types of applications, including desktop software programs, web browsers, mobile apps, databases, and server-side software components. By using threads effectively, developers can create powerful solutions that make use of all available resources in a computer or network environment.
When multiple threads are running simultaneously, it’s called multithreading.
For example, if a user needs to perform complex calculations on a large set of data, then a single thread can spend more time on the calculation while other threads are available to handle other tasks. This helps ensure that all tasks are completed in an efficient manner with minimal impact on overall performance.
Modern processors support hyperthreading, a technology that allows one physical core to be divided into two virtual cores, thus allowing the CPU to work on multiple threads of execution simultaneously. This increases system performance by improving the utilization of available resources and increasing throughput.
Multithreading is a process during which a single processor executes multiple threads simultaneously. This allows the processor to divide tasks into separate threads and run them in parallel, thereby increasing the utilization of available system resources and improving performance.
Multithreading also helps reduce latency by allowing different processes to run in parallel rather than one at a time. It can also be used to help increase the number of tasks that can be executed in any given period of time.
Hyperthreading further increases the performance of multi-core processors by allowing them to execute two threads concurrently. The process works by sharing the resources of each core between two threads. That way, both can be active at the same time while accessing the same cache memory, registers, and execution units.
This allows the processor to take advantage of unused resources and improve performance. Hyperthreading can also result in higher power consumption than regular multithreading, as it requires more active cores to maintain operation.
The compounding effect of hyperthreading means that today’s CPUs can process an incredible number of tasks simultaneously.
The Difference Between Cores vs Threads
The main difference between cores and threads is that a core is an individual physical processing unit, while threads are virtual sequences of instructions.
The performance of a computer depends on the number of cores AND the threading technique. For example, a computer with a quad-core CPU will benefit from multithreading as it utilizes several cores. Meanwhile, a hyperthreading technique can further increase the number of threads that can be active by splitting a single core into two virtual cores, allowing them to run multiple threads.
The trade-off to such strength is that it often comes with a cost, consumes more power, and may only sometimes result in an overall improvement in performance. It’s critical to have comprehensive knowledge not only about the technical specifications of the CPUs you’re considering but also about how your organization will be using them.
Cores and threads are two important components of any modern computer system. Understanding their roles can help you get the most out of your machine. This helps you make informed decisions about how to best use your resources for maximum performance. For example, learning about the difference between cores and threads can be helpful when deciding how to upgrade or optimize your server’s processing power.
If you’re looking for more performance, investing in a multi-core processor with hyperthreading technology may be an option to consider. It can also help you decide whether you should invest in your own server or look for an appropriate partner to fit your business requirements.
A trusted partner like Liquid Web can help you determine which CPUs, number of cores, and threading architecture will provide the best return on investment and performance for your particular needs.
To learn more about what CPUs are the best choice for your infrastructure needs, contact the team at Liquid Web today. From a single VPS to a dedicated cloud deployment, our engineers can help design, deploy, and manage the infrastructure necessary to drive your business forward.
What Are Threads in Computer Processors? A Detailed Explanation
What Are Threads in Computer Processors? A Detailed Explanation
By Vicky |
You may have heard that a CPU with a lot of threads is competent to perform multiple processes that are very intensive. But, what are threads in CPU? In this post, MiniTool Partition Wizard explains the concept. Scroll down to figure it out.
To better understand what threads are in a CPU, you need to have a basic understanding of what CPU is.
CPU (central processing unit) is the core of the computer that dictates the way the computer will perform and determines how well it completes the task.
How does CPU work in a computer? To put it simply, the CPU takes the data from an application or program, performs a series of calculations, and executes the commands.
Now, you can scroll down to learn about the threads in CPU.
What Are Threads in CPU?
CPU threads are the virtual components that divide the physical core of a CPU into virtual multiple cores. They help deliver the workload to the CPU more efficiently.
Generally, when you open an application, the operating system will create a thread for performing all the tasks of that specification application, and the CPU is fed tasks from the thread.
Intel and AMD are the main processor manufacturers. To create a thread, Intel CPUs adopt hyper-threading while AMD CPUs use SMT (simultaneous multithreading). They are both for breaking up physical cores into virtual cores (threads) to increase performance.
How Many Threads in A CPU?
More threads mean higher performance and more the ability to run many processes at once. What determines the CPU thread count? The number of cores in a CPU. Each CPU core can have two threads. Thus, a CPU with two cores will have four threads; a CPU with eight cores will have 16 threads.
CPUs are originally built with one core. To figure out how many CPU cores and threads you have, you can refer to How Many Cores Does Your CPU Have.
Which CPU Has the Most Threads?
The more threads a CPU has, the better performance of the system will be. I list several high-performance CPUs that have many threads for you:
- Intel Core i9-7960X has 16 cores and 32 threads. Featuring the clock speed from 2.80 GHz up to 4.20 GHz and 22MB cache, it is ideal if you look forward to high performance.
- Intel Core i9-7980XE has 18 cores and 36 threads. Its clock speed is up to 2.6GHz and the L3 cache is 24.75MB. The CPU becomes one of the fasted and most powerful processors on the market.
- AMD Ryzen Threadripper 1950x boasts 16 cores and 32 threads. As to the clock speed and L3 cache, they are 4.0Ghz and 32MB, which is superior to the above two Intel CPUs.
CPU Threads vs Cores
There is a lot of confusion going around CPU threads and cores. This part briefly touches on the differences between CPU threads and cores.
Firstly, CPU cores are the actual hardware component, while CPU threads are the virtual component for managing the tasks.
Secondly, CPU cores do all the heavy lifting, while CPU threads aim to help the CPU schedule its tasks more efficiently. If a CPU does not have hyper-threading or multithreading, the tasks will be scheduled less efficiently, causing it to work overtime to access the information that is relevant for running certain applications.
Now, you should have a basic understanding of “what are threads in CPU”. If you want to run the games with playable frame rates, you should try a multithreaded or hyper-threaded CPU, which can also be capable of handling all of the encoding tasks that are related to recording and streaming.
About The Author
Multicore and multithreading in network processors
Universal high-performance processors maximally combine various approaches to parallel computing, including multicore and multithreading. Due to the difficulty of quantifying the effectiveness of multi-core and multi-threading, and due to the significant interdependence of these indicators in the case of universal server processors, development companies characterize their products based on in-house preferences or their own traditions. Analysis of the features of the use of integrated network processors, as well as taking into account the specifics of operating systems and application programs running on them, make it possible to conclude that multi-core network processors are preferable to multithreading. This conclusion is indirectly confirmed by the prevalence of the ARM architecture in network processors, whose patent holders conceptually reject multithreading in favor of multicore.
Performance has always been and remains one of the main characteristics of processors. It is constantly growing as the microelectronic industry masters new technological frontiers. In addition, not relying only on the achievements of technology, processor developers introduce various architectural and structural improvements into them, among which an important place is occupied by the parallelism of calculations in all its manifestations: superscalarity (superscalarity), multithreading (multithreading), multicore (multicore) and multiprocessor ( multiprocessing). In universal high-performance processors, all these approaches are combined as much as possible. However, the levels of parallelism are interdependent, and in practice, due to their mutual influence, reasonable compromises must be found. The criteria for finding such compromises largely depend on the purpose and scope of the processor. One of these areas, which imposes specific requirements on the parallelism of computations, is the network infrastructure, in which integrated network processors (ICPs) have long been rooted and continue to be successfully used.
Qualitative analysis of the relationship between such widely used forms of computational parallelization in processors as multithreading and multicore makes it possible to identify specific features of the ICP from the point of view of these forms and to evaluate the preferred directions for the development of computational parallelism for them.
The ratio of multi-core and multi-threading
At the turn of the millennium, when microelectronics overcame the milestone of a billion transistors on a silicon chip, the demand for a new “abroad” was called into question within the framework of the traditional mononuclear paradigm of processors. In particular, highly integrated intelligent systems-on-a-chip (SoC) promised to solve this problem. Such systems could combine in one microelectronic device a high-performance processor with memory, controllers of various input-output interfaces, and various specialized devices. However, the higher integration of SoCs and the inclusion of specialized equipment in them was inevitably accompanied by their narrower specialization, which limited the areas of application and, as a result, potential output volumes. A decrease in production volumes increased the cost of products, leveled the advantages of high integration and, ultimately, cast doubt on the economic feasibility of developing microelectronics in this direction. Only a few categories of SoCs have managed to establish themselves in the microelectronics market. ISPs have confidently entered their ranks due to the growing mass demand, which was provided by the constant expansion of the worldwide network infrastructure and the permanent intellectualization of its components.
Meanwhile, in the new millennium, the stalled process of increasing the integration of processors itself gained a second wind. The impetus for this, perhaps somewhat unexpected at first glance, was the development of widely available wireless communications, which gave rise to such a phenomenon as cloud computing. The cloud infrastructure, provided as a service to numerous users of various gadgets, required huge amounts of processor resources, compactly concentrated in the servers of cloud data centers. So, among the important indicators of the economic efficiency of data centers was the «density of processors», and high-performance ones, per unit volume of server space. In a competitive pursuit of this indicator, server manufacturers began to rapidly increase the number of processors in them, and their suppliers — the number of cores in their products focused on server applications. The current achievements of leading companies in the field of multi-core server processors are presented in the table.
|Company||Processor family||Maximum number of cores||Number of threads in the core||Maximum total number of threads|
* Available in (24×4) or (12×8) thread configurations
It is no coincidence that multithreading characteristics are included in the table. Companies offering multi-threaded processors prefer to focus on the maximum total number of threads as the most impressive characteristic of their products for advertising purposes. Moreover, multiple threads are often represented as some kind of “logical processors”  or “virtual cores” . However, calling a thread a logical processor or a virtual core is only acceptable in marketing brochures.
Unlike processors and cores, threads share caches of all levels both by data and by address. As a result, multithreading has a tangible effect only when all the threads executing by the kernel belong to the same program. Parallelization of the threads of one program on several processor cores can speed up its execution, provided that the threads are weakly interdependent on data. However, running several programs by different threads on the same core will only make things worse due to constant conflicts in caches, including address table caches, since different programs run in different virtual spaces. Therefore, there is nothing surprising in the fact that often when solving test problems programmed for a different number of threads on a multithreaded processor core, the highest core performance was achieved when working with a single thread [3, 4].
The integration of multiple cores into a single die provided the processor with high potential performance and low cost per unit of performance. In turn, the availability of high-performance multi-core processors, coupled with the ease of increasing their number in servers thanks to the multiprocessor technologies of «sockets» (sockets) and «blades» (blades), largely stimulated the use of servers to solve specific tasks that were not previously characteristic of them, such as, for example, as virtualization of network functions. In addition, it contributed to the active use in data centers of various forms of virtualization and software definition of their main functional components: networks (software-defined networking) and storages (software-defined storage). These useful features come almost for nothing, since the additional overhead of virtualization or software definition is often covered by the technological excess of server resources .
Efficiency of multi-threading
At present, the speed of processors is so much higher than the speed of main memory that often o most of the processor time is wasted waiting for data requested from memory. The same downtime can occur when accessing other shared resources. The key advantage of multithreading is the ability to switch the program to another thread while the current one is forced to idle for one reason or another. To switch to an alternative thread, you need to change the thread context, which a traditional processor without special multithreading support has to do programmatically and can spend more time on this than is lost in idle due to waiting for data. To benefit from multithreading, alternative threads must be ready for immediate execution, and it is desirable to switch them instantly. This problem is solved by simultaneous multithreading (SMT). MLR processors are equipped with special hardware that provides each thread with a separate personal context, which typically includes a set of universal registers and means of prefetching instructions for this thread. It is believed that such additives insignificantly increase the processor hardware, since its common for all threads remains its to Most of the work pipelines, coprocessors and caches of all levels.
In general, it is not possible to quantify the effectiveness of WMD. It depends on too many factors: the features of the algorithm being executed, the methods of its programming, the methods of implementing multithreading at the level of the processor hardware and the operating system. In order for the OMA to speed up the execution of the program, a number of conditions must be met:
- the original algorithm must contain parallelism that allows simultaneous execution of several actions;
- in the working code, the programmer or compiler must organize the threads corresponding to these actions;
- When executing code, threads must not be slowed down by deadlocks on shared hardware resources and when accessing shared data.
The complexity of an objective assessment of the effectiveness of WMD is indirectly, but revealingly, manifested in the difference in approaches, apparently largely intuitive, to its implementation by server processor vendors. While Oracle in the SPARC family of processors or IBM in the POWER family processors traditionally equip their processor cores with eight threads, Intel in all processors with proprietary HyperThreading technology is limited to two threads per core. Its direct competitor, AMD, adheres to the same norm (see table). Of course, an increase in the number of threads in the MTO, under favorable circumstances, should give a performance gain. For example, according to some estimates, in the eight-threaded POWER5 processor, the MLE can increase the total performance by up to 60%, while in the two-threaded Xeon processor, the maximum gain due to the MLE is 30% . The question, however, is whether the hardware and energy costs for additional flows are economically justified by the resulting integral increase in productivity. There is hardly a single answer for all cases, but given the massive use of Intel processors in the widest range of applications, which is much greater than in the case of SPARC or POWER processors, it is difficult to suspect the company of incompetence or neglect of user requests. A more likely assumption is that in-company estimates of the effect of WMD do not induce Intel to increase the number of threads supported by the processor core so that it exceeds two threads.
An even more radical position on WMD is taken by ARM. She believes it is more profitable to replicate entire processor cores on a chip, rather than equip them with multithreading facilities. According to ARM architecture proponents , although a quad-core processor occupies a larger silicon area, it consumes less power than a single-core quad-threaded processor, with a gain of up to 46%. In addition, executing a program with two threads on one core increases the number of cache hits by 42% compared to executing a single thread, while running the same program on two cores, on the contrary, reduces the number of cache hits by 37%. Probably, following these estimates, ARM has almost completely abandoned WMD in its processors. To date, there is only one example of a multi-threaded embodiment of the ARM architecture in the world: its original proprietary implementation in the ThunderX2 processor family from Marvell (see table), although the previous generation of these ThunderX processors (developed by Cavium) was traditionally single-threaded.
Peculiarities of multicore and multithreading in ICP
With all the variety of ICP, they can be divided into two large categories according to the specialization of processor core architectures. Specialized and, as a rule, extremely simplified cores due to specialization allow, all other things being equal, to increase the overall performance of the ICP both due to the greater efficiency of a separate core, and due to the possibility of integrating more of them into SoCs at the same technology level. Examples of ICPs with original architecture and specialized cores are the products of Netronome , Tilera (now Mellanox) , EZchip (now the same Mellanox) , Xelerated (now Marvell) . These products probably include the Cisco nPower ISP, one of the giants of the network equipment market, which has achieved a record integration in 672 processor cores with a proprietary architecture .
However, universal processor cores with traditional architectures, first of all, with the ARM architecture , are widely and more often used in ICPs. Examples include ISPs from Motorola/Freescale (now NXP) , LSI (now Intel)  or Cavium (now Marvell) . The observed quite tangible shift in the preferences of ICP developers from specialized to universal processor cores is quite understandable. The former are difficult to program due to the specificity of the architecture and the limited software development tools available only to owners of architecture know-how, while the latter have a much more developed ecosystem with a variety of ready-made operating environments and libraries, as well as convenient proven development tools, including .from multiple third party vendors. As a result, standard architectures of processor cores in ICPs make it much easier for consumers to create their own applications and, as a result, reduce the cost of development and reduce the time to market for new products.
Although general-purpose processor cores are increasingly being implemented in ICPs, they are not the same cores used in server processors. So, today there are practically no examples of using x86 cores in ICPs. The main (if not the only) advantage of the latter lies in the huge amount of accumulated server software. However, a number of its essential components, such as virtualization or software definition of servers, are not relevant to the ISP. At the same time, the ARM and MIPS architectures widely used in ICPs make it possible to place more processor cores on a chip and achieve not only higher performance, but also better energy efficiency of the ICP.
In addition, approaches to organizing caches differ noticeably in processors for servers and ISPs. If in server processors, in order to maximize the independence of the cores from a common memory interface for all and bring the functionality of the core closer to the capabilities of a single processor, each core is equipped with its own second-level (L2) cache, for ISP, cluster organization of cores with a common L2 cache for all cores may be more preferable. one cluster . The figure illustrates the difference in the organization of caches.
Drawing. Differences in the organization of caches in server processors and ISP
Let us separately note the role of heterogeneous clusters in the structure of processors. According to some estimates, at the level of the working pipeline, the equipment gives the maximum return, being heterogeneously clustered in accordance with the requirements of the application . True, it remains unclear how you can guess the requirements of applications in advance when creating a processor for servers that can run a wide variety of tasks in different operating environments. Another thing is ICP, where the range of applications is limited, and their characteristics are really known in advance. Moreover, in ISP, heterogeneous functionally specialized clustering of processor cores is also desirable at the architectural level , since it allows the most efficient use of L2 caches and specialized ISP hardware.
In the example in the figure, three SP clusters are implemented on processor cores of two types, have caches of different sizes and include a different number of cores with different additional specialized hardware in the cores and clusters. Thus, it is in ICP that heterogeneous clustering is able to show its advantages to the fullest extent.
ISPs do not impose any specific requirements on the multithreading of processor cores. However, in practice, the MLR in the ICP may simply be redundant and result in stranded hardware costs for two reasons. First, typical ICP applications involve hard real-time operation using appropriate operating environments. Meanwhile, real-time operating systems have traditionally been designed for multitasking rather than multithreading. Secondly, ICP is characterized by a fixed set of working codes, usually of a relatively small size, which fits entirely in L2 caches of clusters or is stored in special local memory (tightly-coupled memory) of processor cores. Therefore, in a typical ISP, especially in a clustered one, the execution of a separate program thread is not slowed down by accesses to external memory, and multithreading as such loses its meaning, having lost its main advantage. This feature of the ICP serves as an additional weighty argument in favor of using processor cores of the ARM architecture in them with its conceptual rejection of multithreading in favor of aggressive multicore.
Growing performance requirements for processors force their developers to use all sorts of methods to improve performance, including parallelization of calculations in multithreading and multicore versions. Although processor manufacturers sometimes advertise multi-threading in their products as an alternative to multi-core, a thread is not equivalent to a processor core, much less a processor, and multi-threading does not replace multi-core. Due to the lack of universal assessments of the effectiveness of multithreading and confidence in its absolute efficiency, developers from different companies are guided by intuition, in-house assessments and traditions when choosing technical solutions for their processor cores. As a result, processor cores with different numbers of threads are on the market. There are also examples of fundamental ignoring of multithreading.
If multithreading is not a replacement for multicore, then multicore can successfully replace multithreading in many cases. It is more universal, since it is applicable for parallelizing both threads of one task, and entire tasks. Moreover, even when threads are parallelized, it can be more profitable, in particular, in terms of power consumption.
Especially successfully multi-core, including as an alternative to multi-threading, is applicable in ICP. Features of the internal organization of ICPs and the specifics of the software running in them objectively limit multithreading in the ability to demonstrate its best properties. At the same time, they allow the benefits of multi-core to be fully manifested. It is no coincidence that the ARM architecture finds more and more widespread use in ICP, the patent holders of which give unconditional preference to multi-core over multi-threading.
- Rechistov. Processors, cores and threads. Topology of systems.
- QorIQ T4240/T4160/T4080 Multicore Communications Processors//www.nxp.com.
- Why is my multithreading not efficient?//stackoverflow.com.
- A. A. Kadomsky, V. A. Zakharov. Efficiency of multithreaded applications. Science Magazine. 2016//scientificmagazine.ru.
- V. B. Egorov. Overhead costs of virtualization and factors influencing them. Systems and means of informatics. 2017. V. 27. No. 3.
- Rua, E.Nahum, V.Pai, J.Tracey. On the effectiveness of simultaneous multithreading on network server workloads//pdfs.semanticscholar.org.
- Mulligan. ARM is no fan of HyperThreading//www.theinquirer.net.
- V. Egorov. Integrated network stream processors from Netronome. Electronic components. 2014. No. 10.
- V. Egorov. Multi-core network processors from Tilera. Electronic components. 2014. No. 6.
- V. Egorov. EZchip network processors. Electronic components. 2011. No. 7.
- V. Egorov. Xelerated network processors. Electronic components. 2011. No. 10.
- Markevitch, S. Malladi. A 400 Gbps Multi-Core Network Processor//www.hotchips.org.
- V. Egorov. The trend towards ARMing multi-core integrated network processors. Electronic components. 2018. No. 6.
- V. Egorov. NXP integrated network processors with ARM cores conquer the upper tier. Electronic components. 2018. No. 3.
- V. Egorov. Axxia communication processors from LSI. Electronic components. 2014. No. 4.
- V. Egorov. Integrated network processors Cavium. Electronic components. 2011. No. 4.
- V. B. Egorov. Modern trends in the development of integrated network processor architectures. Systems and means of informatics. 2014. V. 24. No. 3.
- Acosta, A. Falcón, A. Ramirez, M. Valero. A complexity-effective simultaneous multithreading architecture. Proceedings of the International conference on parallel processing//upcommons. upc.edu
Processors, Threads, and Processes: Is the PC Looking for a Multi-Core Future?
For many decades, all processors had only one core and one thread. It took a long time before the appearance of the first Dual Core CPUs. Now you can have 8, 12, 16 or more cores in your home computer CPU. Modern PCs have processors that can process many threads at the same time. All thanks to developments in the field of design and manufacture of microcircuits. But what are threads, and why is it so important that the CPU can handle more than one thread? In this article, the reader will find the answer to these and other questions.
IT recruiter course.
Become the volunteer of the junivs.
I like an idea
- 1 What is flow?
- 2 CPU
- 3 Multitasking
- 4 Video cards
- 5 Need more threads
- 6 Why don’t games use many threads?
- 7 Multithreading — the future?
- 8 Summary
What is flow?
Simply put, a processor thread is a set of data, the shortest sequence of instructions needed to complete a computational task. This may be a very short list, but it can also be huge in length. What affects this is the process that threads are a part of (as shown below).
So now we have a new question to answer (i.e. what is a process?), but luckily it’s just as easy to solve. If you are using Windows on your computer, press the Windows key and X and select Task Manager from the list that appears.
By default it opens in the Processes tab and you should see a long list of processes currently running on your computer. Some of them will be stand-alone programs that run on their own without user input.
Others will be an application that you can control directly. Some of them can generate additional background processes — tasks that are performed behind the scenes at the direction of the main program.
If you switch to the Performance tab in Task Manager and then select the CPU section, you will see how many processes are currently running as well as the total number of active threads.
Is your English speaker boring?
Learn language from satisfaction. Pіdberemo vikladach, which also love Marvel, not DC.
Every time a process wants to access a file, whether in RAM or storage, a file descriptor is created. Each one is unique to the process that created it, so a single file can have multiple handles.
Returning to threads, the Task Manager doesn’t say much about them—for example, the number of threads associated with each process is not shown. Luckily, Microsoft has another program called Process Explorer to help with this.
Here we can see a much more detailed overview of the various processes and their threads.
Notice how some programs generate a relatively small number of instruction sequences (for example, the Corsair iCUE plugin host has only one), while other programs number in the hundreds, such as the System.
It is the operating system that generates most of these threads. The OS then proceeds to create and manage them on its own.
The final destination for any thread is the central processing unit (CPU). This device takes a list of instructions, translates them into a «language» that it understands, and then performs the assigned tasks.
Deep inside the processor, special hardware stores threads for analysis, and then sorts their list of instructions to best match what the processor is doing at the time.
Even on older Pentium processors, thread instructions can be reordered slightly to maximize performance. Modern CPUs contain extremely complex thread management tools due to the sheer number of threads they have to manage.
If the stream contains a sequence of «If…then…else» instructions, the prediction circuit evaluates the most likely outcome. The answer to that guess then causes the CPU to dig into its instruction store and then execute the ones required by the logic solution.
If the «prediction» was correct, then a significant amount of time is saved from having to wait for the entire thread to be processed. If not, then it’s not so good — that’s why processor developers are hard at work on it! A modern processor independently selects the most necessary data stream for processing at a certain point in time.
Intel server processors of the first half of the 90s
Central processors of the 1990s, whether desktop or server, had only one core, so they could only work with one thread at a time, although they could execute several instructions at the same time (known as superscalar).
ART PROJECT PRODUCTION course.
Encourage the creation of professional art projects and realize your most daring creative ideas!
Knowing the software
High-end servers and workstations have to deal with a huge number of threads, and Pentium-era machines typically had two CPUs to handle the workload. However, the idea that a processor can handle multiple threads at the same time has been around for quite some time. Modern servers also use multiple processors on the same motherboard.
The idea of a CPU executing more than one thread instruction in its core, also known as simultaneous multithreading (SMT), had to wait. It took a long time for the capabilities of the equipment to allow the implementation of such a technology.
Intel’s Northwood architecture brought multithreading to the masses. 1 core, 2 threads in the Intel Pentium 4
This was achieved by 2002 when Intel released a new version of the Pentium 4 processor. It was the first fully SMT-compliant desktop CPU, with a feature called Intel Hyper- threading. All modern processors are its heirs.
So how does one processor core work with two threads at the same time? Think of the processor as a complex factory consisting of several stages: receiving and organizing raw materials (i.e. data), then sorting orders (streams), breaking them down into many smaller tasks.
Just as a high-volume car production line will work on different parts, one or two at a time, the CPU must perform different tasks in a given sequence in order to execute a given set of instructions.
This is how the conveyor works, the different stages will not always be busy. Some data has to wait for some time until the previous steps are completed.
This is where SMT (Simultaneous multithreading) comes into play. Hardware designed to keep track of the state of each part of the «pipeline» is used to determine if another thread can use idle stages without stopping the current thread.
Learn the art of photography and create unforgettable photos for your projects!
The fact that desktop processors were multi-threaded long before they were multi-core shows that SMT is much easier to implement. In the case of the Intel Northwood architecture, less than 5% of the entire die was used to manage two threads.
CPU cores that support SMT are organized so that they appear to the operating system as separate logical cores. Physically, they use the same resources, but operate independently.
Desktop CPUs process no more than two threads per CPU core because their pipelines are relatively short and simple, and analysis by the developers would show that two is the optimal limit. Therefore, we still do not see 8 cores — 24 threads, etc. in home computers.
IBM Power10 CPU — 15 SMT8 cores
On the opposite end of the spectrum, huge server processors like the Intel Xeon Phi chips or the latest IBM POWER processors handle 4 and 8 threads per core, respectively. This is because their cores contain many pipelines with shared resources.
These different approaches to CPU design arise from the very different workloads the chips have to deal with.
CPUs are not the only microcircuits in a computer that have to deal with a large number of threads. There is one chip with a very specific role that handles thousands of threads at the same time.
When it comes to excessive numbers, processors completely lose to video cards. They are physically larger, have many more transistors, consume more power, and process many more threads than any server processor.
An entry-level graphics card is faster than a modern 32-core AMD Ryzen processor
Take, for example, an AMD Radeon RX 6800 graphics card with a Navi 21 chip. This processor consists of 60 compute units (CUs), each of which must simultaneously process 64 separate threads. That’s 3840 threads!
So how does a GPU handle much larger tasks than a CPU?
Each CU has two sets of SIMD blocks (one instruction, multiple data), and each of them can work with 32 separate data elements at the same time. They can all be from different threads, but the catch is that the module must execute the same instruction on every thread.
This is a key difference from the CPU — where a desktop processor core will process no more than two threads, instructions can be completely different, from completely unrelated processes.
GPUs are designed to do the same things over and over again, usually from the same processes (technically known as cores, but we’ll leave that aside), but it’s all done in parallel.
As with the IBM POWER10, an enterprise server-only processor, the graphics card is built for a very specialized task.
The largest modern games with their complex 3D images require an incredible amount of mathematical calculations in just a few milliseconds. And this requires a huge amount of flows.
More Threads Needed
If you take a look at any CPU overview, you will almost always see two results from Cinebench, a test that performs the difficult task of CPU-based rendering.
One result refers to a test using only one thread, while the other will use as many threads as the CPU can handle. The results of the latter are always much faster than those of a single-threaded test. Why is it so?
Cinebench renders 3D graphics like a game, only one frame in high detail. And if you remember how GPUs execute many threads in parallel to create 3D graphics, it becomes obvious why processors with a large number of cores, especially with SMT, can handle the workload so quickly. This is one of the few scenarios where all CPU cores/threads can be implemented.
Unfortunately, adding more cores just makes the processor bigger and therefore more expensive, so it would seem that SMT will always be a good technology. However, much depends on the situation.
For example, AMD Ryzen 9 3950X (12-core processor with 24 threads) shows different results in 36 different games with and without SMT enabled. Some games will experience 10-16% better performance with SMT enabled, while others will experience 10-12% worse performance.
The average difference, however, was only 1%, so it’s certainly not the case that SMT should always be disabled during gaming. But this raises a few more questions.
First, why does the game run 12% slower if the CPU cores are processing two threads at the same time? The key phrase here is «conflict over resources».
The more threads a CPU can handle, the more important the caching system in the CPU becomes. This becomes apparent when examining processors with a fixed L3 cache size, no matter how many cores are enabled.
The more cores and threads a chip has, the more cache requests the system has to process. And that brings us to the next question: is this why most games can’t handle a lot of threads/cores?
Why don’t games use many threads?
Let’s go back to Process Explorer and look at a few games, namely Cyberpunk 2077, Spider-Man Remastered and Shadow of the Tomb Raider. All three were developed for PC and consoles, so you can expect them to use 4 to 8 threads.
At first glance, games do use multiple threads. But it’s not possible because the processor used in the computer running the games supports a maximum of 8 threads. But if we dig deeper into process flows, we get a much clearer picture. Let’s take a look at Shadow of the Tomb Raider.
Below we can see that the vast majority of these threads take almost no CPU time (second column, displayed in seconds). Although the process and OS have created over a hundred threads, most of them run too fast to even register.
Delta cycles is the total number of CPU cycles accumulated by a thread in a process, and in the case of this game it is dominated by just two threads. However, others still use all available processor cores.
The number of cycles may seem like a ridiculous number, but if the processor has a clock speed of, say, 4.5 GHz, then one cycle takes only 0. 22 nanoseconds. So 1.3 billion cycles corresponds to just under 300 milliseconds.
Of course, not all games can do this, and the older the project, the fewer threads it uses. If we look at the original Call of Duty from 2003, we see a very different picture.
All games of that era were like this — just one thread for everything. This is due to the fact that at that time processors had only one core, and relatively few of them supported SMT.
While a Call of Duty process requires a single thread, Shadow of the Tomb Raider is properly multi-threaded at the same time (as many as the CPU supports).
Initially, the hardware was ahead of the software when it came to fully utilizing all the cores on offer (with or without SMT), and we had to wait several years before games became fully multi-threaded.
Now that the latest consoles have an 8-core processor with 2SMT support, future games will certainly be more threaded.
Is multithreading the future?
User can get a desktop PC with a CPU that can handle 32 threads (AMD Ryzen 9 7950X) and a GPU that can handle 4096 threads (Nvidia GeForce RTX 4090).
This hardware is of course at the forefront of technology, cost and power and certainly not representative of what most computers have to offer. But about 10 years ago, the picture was very different.
The best processors supported 8 threads via SMT, but the average PC usually had to make do with about 4 threads. You can now get processors under $100 that perform just like the best chips from 7 years ago.
4 cores, 8 threads, less than $100 — The Intel Core i3-10100 is about the same performance level as the Intel Core i7-7700.
We can thank AMD for this, as they were the first to offer many cores/threads at an affordable price. The AM4 platform has revolutionized the home PC world. And today, both manufacturers regularly fight over who can offer the most cores/threads per dollar.
We are at a stage where new games are almost taking full advantage of all the processing power available to them, unless they are limited by the GPU.