Cpu affinity tool: dAffinity, CPU Affinity and Priority Tool – d7xTech.com (formerly Foolish IT)

Processor and Memory Affinity Tools » ADMIN Magazine

It’s called high-performance computing (HPC), not low-performance computing (LPC), not medium-performance computing (MPC), and not even really awful-performance computing (RAPC). The focus is doing everything possible to get the highest performance possible for your applications.

Needless to say, but I will say it anyway, processors and systems have gotten very complicated. Individual CPUs can have 64+ cores, and this number is growing. They are being packaged in different ways, including multichip modules with memory controllers connected in various locations, multiple memory channels, multiple caches sometimes shared across cores, chip and module interconnections, network connections, Peripheral Component Interconnect Express (PCIe) switches, and more. These elements are connected in various ways, resulting in a complex non-uniform memory access (NUMA) architecture.

To get the best possible performance, you want the best bandwidth and least latency between the processing elements and between the memory and processors. You want the best performance from the interconnect between processing elements, the interconnect among processing and memory elements and accelerators, and the interconnect among the processors and accelerators to external networks. Understanding how these components are connected is a key step for improving application performance.

Compounding the challenge of finding the hardware path for best performance is the operating system. Periodically, the operating system runs services, and sometimes the kernel scheduler will move running processes from a particular process to another as a result. As a result, your carefully planned hardware path can be disrupted, resulting in poor performance.

I have run all types of code on my workstation and various clusters, including serial, OpenMP, OpenACC, and MPI code. I carefully watch the load on each core with GKrellM,and I can see the scheduler move processes from one core to another. Even when I leave one or two cores free for system processes, with the hope that processes won’t be moved, I still see the processes move from one core to another. In my experience, when running serial code, it only stays on a particular core for a few seconds before being moved to another core.

When a process move takes place, the application is “paused” while its state moves from one processor to another, which takes time and slows the application. After the process is moved, it could be accessing memory from another part of the system that requires traversing a number of internal interconnects, reducing the memory bandwidth, increasing the latency, and negatively affecting performance. Remember, it’s not LPC, it’s HPC.

Fortunately, Linux has developed a set of tools and techniques for “pinning” or “binding” processes to specific cores while associating memory to these cores. With these tools, you can tell Linux to run your process on very specific cores or limit the movement of the processes, as well as control where memory is allocated for these cores.

In this article, I present tools you can use for binding processes, and in subsequent articles, I show how they can be used with OpenMP and MPI applications.

Example Architecture

I’ll use a simple example of a single-socket system with an AMD Ryzen Threadripper 3970X CPU that has simultaneous multithreading (SMT) turned on.

A first step in understanding how the processors are configured is to use the command lscpu. The output of the command on the example system is shown in Listing 1. The output notes 64 CPUs and two threads per CPU, indicating that SMT is turned on, which means 32 “real” cores and 32 SMT cores.

Listing 1: lscpu

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3970X 32-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         2198. 266
CPU max MHz:                     3700.0000
CPU min MHz:                     2200.0000
BogoMIPS:                        7400.61
Virtualization:                  AMD-V
L1d cache:                       1 MiB
L1i cache:                       1 MiB
L2 cache:                        16 MiB
L3 cache:                        128 MiB
NUMA node0 CPU(s):               0-63
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_ts
                                 c cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignss
                                 e 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp vmmcall fsgsbase bmi1 avx2 sme
                                 p bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbr
                                 v svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

Also note the single socket and one NUMA node. The output also lists the L1d (data) cache as 1MiB, the L1i (instruction) cache as 1MiB, the L2 cache as 16MiB, and the L3 cache as 128MiB. However, it doesn’t tell you how the caches are associated with cores.

One way to get most of this information in a more compact form is shown in Listing 2.

Listing 2: Compact lscpu

$ lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU\(s\)'
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Socket(s):                       1
NUMA node(s):                    1
Model name:                      AMD Ryzen Threadripper 3970X 32-Core Processor
NUMA node0 CPU(s):               0-63

One important question to be answered is: Which cores are “real,” and which cores are SMT? One way is to look at the /sys filesystem for the CPUs:

$ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,32

If the first number in the output is equal to the CPU number in the command, then it’s a real core. If not, it is a SMT core. For the example command, the CPU number in the command is 0 and the first number is also 0. This makes it a real core.

Now try the command on a few other CPUs (Listing 3). The first command looks at CPU 1, and it’s a real core (the CPU number is 1, and the first number in the output is 1, which matches). CPU 30 and 31 are also both real cores. However, when the command is run on CPU 32, the first number in the output is 0. Because 0 does not match 32, it is an SMT core. The same is also true CPU 33.

Listing 3: Real or SMT? Method 1

$ cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
1,33
$ cat /sys/devices/system/cpu/cpu30/topology/thread_siblings_list
30,62
$ cat /sys/devices/system/cpu/cpu31/topology/thread_siblings_list
31,63
$ cat /sys/devices/system/cpu/cpu32/topology/thread_siblings_list
0,32
$ cat /sys/devices/system/cpu/cpu33/topology/thread_siblings_list
1,33

You can also use the first number in the output for the SMT cores as the real core with which it is associated. For example, CPU 32 is associated with CPU 0 (the first number in the output). So CPU 0 is the real core and CPU 32 is the SMT core in the pair.

Understanding the numbering of the real and SMT cores is important, but you have another way to check whether the CPU is real or SMT. Again, it involves examining the /sys filesystem (Listing 4). The output from the command is in pairs, listing the real CPU number first and the associated SMT CPU number last. The first line of the output says that CPU 0 is the real core and CPU 32 is the SMT CPU. Really it’s the same as the previous command, except it lists all of the cores at once.

Listing 4: Real or SMT? Method 2

$ cat $(find /sys/devices/system/cpu -regex ".*cpu[0-9]+/topology/thread_siblings_list") | sort -n | uniq
0,32
1,33
2,34
3,35
4,36
5,37
6,38
7,39
8,40
9,41
10,42
11,43
12,44
13,45
14,46
15,47
16,48
17,49
18,50
19,51
20,52
21,53
22,54
23,55
24,56
25,57
26,58
27,59
28,60
29,61
30,62
31,63

The lstopo tool can give you a visual layout of the hardware along with a more detailed view of the cache layout (Figure 1). This very useful command returns the hardware layout of your system. Although it can include PCIe connections as well, I’ve chosen not to display that output.

Figure 1: lstopo output for sample systems.

Notice in the figure that each 16MB L3 cache has four groups of two cores. The first core in each pair is the real core and the second is the SMT core. For example, Core L#0 has two processing units (PUs), where PU L#0 is a real core listed as P#0 and PU L#1 is the SMT core listed as P#32. Each group of two cores has an L2 cache of 512KB, an L1d cache of 32KB, and a L1i cache of 32KB.

The eight L3 cache “groups” make a total of 64 cores with SMT turned on.

Affinity Tools

In this article, I discuss two Linux tools that allow you to set and control application threads (processes), giving you great flexibility to achieve the performance you want. For example, a great many applications need memory bandwidth. The tools allow you to make sure that each thread gets the largest amount of memory bandwidth possible.

If network performance is critical to application performance (think MPI applications), with these tools, you can bind threads to cores that are close to a network interface card (NIC), perhaps not crossing a PCIe switch. Alternatively, you can bind processes to cores that are as close as possible to accelerators to get the maximum possible PCIe bandwidth.

The Linux tools presented here allow you to bind processes and memory to cores; you have to find the best way to use these tools for the best possible application performance.

taskset

The taskset command is considered the most portable Linux way of setting or retrieving the CPU affinity (binding) of a running process (thread). According to the taskset man page, “The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs.”

An example of executing a process with the taskset command is:

$ taskset --cpu_list 0,2 application. exe

This command sets the affinity of application.exe to cores 0 and 2 and then executes it. You can also use the short version of the —cpu_list option, -c.

If you want to change the affinity of a running process, you need to get the process ID (PID) of the processes with the —pid (-p) option. For example, if you have an application with four processes (or four individual processes), you get the PIDs of each process and then run the following command to move them to cores 10, 12, 14, and 16:

$ taskset --pid --cpu_list 10 [pid1]
$ taskset --pid --cpu_list 12 [pid2]
$ taskset --pid --cpu_list 14 [pid3]
$ taskset --pid --cpu_list 16 [pid4]

numactl

One key tool for pinning processes is numactl, which can be used to control the NUMA policy for processes, shared memory, or both. One key thing about numactl is that, unlike taskset, you can’t use it to change the policy of a running application. However, you can use it to display information about your NUMA hardware and the current policy (Listing 5). Note for this system, SMT is turned on, so the output shows 64 CPUs.

Listing 5: numactl

$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 64251 MB
node 0 free: 60218 MB
node distances:
node 0
0: 10

The system has one NUMA node (available: 1 nodes), and all 64 cores are associated with that NUMA node. Because there is only NUMA node, the distances from NUMA node 0 to NUMA node 0 is listed as 10, which indicates it’s the same NUMA node. The output from the command also indicates it has 64GB of memory (node 0 size: 64251 MB).

The advantages of numactl come from its ability to place and bind processes, particularly in relation to where memory is allocated, for which it has several policies that are implemented as options to the command:

  • The —interleave=<nodes> policy has the application allocate memory in a round-robin fashion on “nodes. ” With only two NUMA nodes, this means memory will be allocated first on node 0, followed by node 1, node 0, node 1, and so on. If the memory allocation cannot work on the current interleave target node (node x), it falls back to other nodes, but in the same round-robin fashion. You can control which nodes are used for memory interleaving or use them all:
$ numactl --interleave=all application.exe

This example command interleaves memory allocation on all nodes for application.exe. Note that the sample system in this article has only one node, node 0, so all memory allocation uses it.

  • The —membind=<nodes> policy forces memory to be allocated from the list of provided nodes (including the all option):
$ numactl --membind=0,1 application.exe

This policy causes application.exe to use memory from node 0 and node 1. Note that a memory allocation can fail if no more memory is available on the specified node.

  • The cpunodebind=<nodes> option causes processes to run only on the CPUs of the specified node(s):
$ numactl --cpunodebind=0 --membind=0,1 application.exe

This policy runs application.exe on the CPUs associated with node 0 and allocates memory on node 0 and node 1. Note that the Linux scheduler is free to move the processes to CPUs as long as the policy is met.

  • The —physcpubind=<CPUs> policy executes the process(es) on the list of CPUs provided:
$ numactl --physcpubind=+0-4,8-12 application.exe

You can also say all, and it will use all of the CPUs. This policy runs application.exe on CPUs 0-4 and 8-12.

  • The —localalloc policy forces allocation of memory on the current node:
$ numactl --physcpubind=+0-4,8-12 --localalloc application. exe

This policy runs application.exe on CPUs 0-4 and 8-12, while allocating memory on the current node.

  • The —preferred=<node> policy causes memory allocation on the node you specify, but if it can’t, it will fall back to using memory from other nodes. To set the preferred node for memory allocation to node 1, use:
 numactl --physcpubind=+0-4,8-12 --preferred=1 application.exe

This policy can be useful if you want to keep application.exe running, even if no more memory is available on the current node.

To show the NUMA policy setting for the current process, use the —show (-s) option:

$ numactl --show

Running this command on the sample system produces the output in Listing 6.

Listing 6: numactl —show

$ numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
cpubind: 0
nodebind: 0
membind: 0

The output is fairly self-explanatory. The policy is default. The preferred NUMA node is the current one (this system only has one node). It then lists the physical cores (physcpubind) that are associated with the current node, the bound CPU cores (node 0), and to what node memory allocation is bound (again, node 0).

The next examples show some numactl options that define commonly used policies. The first example focuses running a serial application – in particular, running the application on CPU 2 (a non-SMT core) and allocating memory locally:

$ numactl --physcpubind=2 --localalloc application.exe

The kernel scheduler will not move application.exe from core 2 and will allocate memory using the local node (node 0 for the sample system).

To give the kernel scheduler a bit more freedom, yet keep memory allocation local to provide the opportunity for maximum memory bandwidth, use:

$ numactl --cpunodebind=0 --membind=0 application.exe

The kernel scheduler can move the process to CPU cores associated with node 0 while allocating memory on node 0. This policy helps the kernel adjust processes as it needs, without sacrificing memory performance too much. Personally, I find the kernel scheduler tends to move things around quite often, so I like binding my serial application to a specific core; then, the scheduler can put processes on other cores as needed, eliminating any latency in moving the processes around.

In a subsequent article, I will have examples that show how to use numactl with OpenMP and MPI applications.

Tool for Monitoring CPU Affinity

Both taskset and numactl allow you to check on any core or memory bindings. However, sometimes they aren’t enough, which creates an opportunity for new tools. A good affinity monitoring tool, show_affinity, comes from the Texas Advanced Computing Center (TACC).

The tool shows “… the core binding affinity of running processes/threads of the current user.” The GitHub site has a simple, but long, output example from running the command (Figure 2).

Figure 2: Output of the TACC show_affinity tool (used with permission from the GitHub repository owner).

Summary

Today’s HPC nodes are complicated, with huge core counts, distributed caches, various memory connections, PCIe switches with connections to accelerators, and NICs, making it difficult to understand clearly where your processes are running and how they are interacting with the operating system. This understanding is extremely critical to getting the best possible performance, so you have HPC and not RAPC.

If you don’t pay attention to where your code is running, the Linux process scheduler will move it around, introducing latency and reducing performance. The scheduler can move processes into non-optimal situations, where memory is used from a different part of the system, resulting in much reduced memory bandwidth. It can also cause processes to communicate with NICs across PCIe switches and internal system connections, again resulting in increased latency and reduced bandwidth. This is also true for accelerators communicating with each other, with NICs, and with CPUs

Fortunately, Linux provides a couple of tools that allow you to pin, also called binding or setting the affinity of, processes to specific cores along with specific directions on where to allocate memory. In this way, you can prevent the kernel process scheduler from moving the processes – or at least control where the scheduler can move them. If you understand how the systems are laid out, you can use these tools to get the best possible performance from your applications.

In this article, I briefly introduced two tools – taskset and numactl – along with some very simple examples of how you might use them, primarily on serial applications. A subsequent article will explain how you can use them with OpenMP and MPI applications.

PinAffinity

PinAffinity



One of the best performance tweaks I’ve found for virtual pinball
is «CPU affinity». It’s also one of the least known, probably
because there aren’t a lot of good tools for it. PinAffinity
is my attempt to fix that by offering a simple tool tailored
to virtual pin cabs, designed to be extremely simple to set
up, and completely automatic once configured.



CPU affinity optimizations aren’t about increasing your
«FPS» (frame rate) numbers. Your frame rate probably won’t
increase after tweaking affinities, although it might if you’re
using an older CPU and you’re close to maxing it out. But even a
machine with plenty of CPU power can see some benefit, because
FPS numbers only tell part of the story.


The big thing that affinity optimization can help with is
«stutter». Stutter is a sort of jerky motion that
you see on the video display, sporadically, especially
when the machine is under a lot of load. In a pinball
emulator, you might see this during a multiball mode,
or when a lot of lighting effects are firing. You might
also hear audio glitching at these times.


Stutter can show up sporadically even if you have a fast
machine with high average FPS rates. Stutter happens during
momentary spikes in
load, so it can be hard to get rid of using the normal
performance tuning techniques, which are all about improving
overall throughput. Making your machine faster overall is
always good, but stutter isn’t so much about overall speed as
about instantaneous responsiveness. And that’s where CPU
affinity can help. Affinity optimization can reduce stutter
by reducing thread scheduling latency, allowing video game
threads to respond more quickly when they have work ready to do.
This is especially important for video animation,
because animation has to keep with the video refresh
cycle schedule in order to look smooth and consistent. CPU
affinity tuning can reduce the competition for CPU resources from other
running programs, which can in turn allow VP to keep in closer
sync with real time as it does its physical computations and graphics
rendering.


Downloads



Use the «X-bit» version that matches your Windows operating system type.
If you’re not sure what you have, open your «System» control panel and
look at the «System type» field. It should say something like
«64-bit Operating System».

  • PinAffinity 64-bit build (February 22, 2020)
  • PinAffinity 32-bit build (February 22, 2020)
  • Source code on github

Installation and basic use


Just download it and unzip the
files into a folder on your disk. There’s no need for a Setup
program or registry settings or anything like that; it’s all
self-contained in the one folder.


Double-click the program’s .exe file to run it. This will bring up a window
showing running processes on your system. The program is pre-configured
with the current Visual Pinball files set to «Pinball» mode. You can set
the modes for other running programs by right-clicking them and selecting
Set CPU Affinity Type from the menu. You can also set the type for a
program that’s not currently running using the Add Program menu command.
All changes are saved immediately and will be restored the next time you
run the program.


When you first install the program, you should check that the affinity settings are
working. To do this, run a game in Visual Pinball, and while the game is running,
go back to the PinAffinity window and check the CPU affinity column for the
VP executable you’re running (e.g., VPinballX.exe if you’re using VP 10). If
things are working, the Affinity column should show something like «-+++».
If it says «Unable», it means that Windows blocked the affinity change
due to privilege restrictions that apply to certain system processes.
If you want PinAffinity to control those
blocked processes, see Admin Mode below.


For more instructions, see the README file included with the download.


Automatic startup



You’ll probably want to set up PinAffinity so that it automatically
launches every time you start your system. There are several ways to
do this. The right way for you depends on whether or not you want
to run in Admin Mode:



User mode: If you only want to run PinAffinity with normal
«User» level privileges, launching at startup is easy. Simply create
a Windows Shortcut for PinAffinity in your Start Menu > Programs >
Startup folder. Do not enable the Run as Administrator
setting in the «Properties» dialog (for either the Shortcut or for the
.exe file itself) if you use a Startup shortcut. Windows security rules
prohibit that, and Windows will simply ignore the Startup shortcut.


Admin mode: The designers of Windows didn’t want to encourage
you to launch random software in Admin mode, so they didn’t make it
easy. But it can be done. Here’s the procedure:

  • Run Windows Task Scheduler (use Windows+S to search for it)
  • Click Create Task in the Actions pane
  • On the General tab, enter a name (say, «Run PinAffinity»)
  • Click the box «Run with the highest privileges»
  • On the Triggers tab, click New
  • In the «Begin the task» drop-down at the top, select «At log on»
  • Click OK
  • On the Actions tab, click New
  • Click Browse and select PinAffinity.exe
  • In the «Add arguments» box, type /MINIMIZE
  • Click OK
  • Click OK in the main dialog

Admin Mode




By default, Windows doesn’t allow PinAffinity to modify the affinity settings
for system processes (internal programs started by Windows itself) or for
programs launched by other users. When you see «Unable» in the affinity
setting column for a process, that means that the process is protected
against alteration by PinAffinity.



You can get around this by running PinAffinity in «Administrator Mode»,
which gives the program elevated privileges as though it were a system
process itself. By default, Windows runs most software with «User»
privileges, which restricts access to some internal system resources,
including the ability to mess around with system processes. This is
in part to protect your system against malware, and in part to protect
against simple programming errors by otherwise benign programs. Most
programs don’t need special privileges to do their jobs, so they won’t
even notice the restrictions. But the nature of PinAffinity’s work
does require Admin rights if you want the program to be able to control
affinities for system processes.



Should you run PinAffinity in Admin mode? If you want the
maximum optimization benefit, the answer is yes. But if you’re
uncomfortable giving elevated privileges to random open-source
software like this, you’ll still get much of the benefit running
in regular User mode.


Duration of affinity changes



The CPU affinity changes are only in effect while PinAffinity is actually running.
No permanent changes are made to your system, and all normal Windows
defaults are restored as soon as the program exits. That means you have
to leave PinAffinity running while playing VP games for its optimizations
to work. You should minimize the PinAffinity window while running VP games, to
eliminate the overhead of redrawing the window. The window minimizes to a «system tray»
icon for reduced desktop clutter; just click the icon to bring the window back.


About CPU affinities



CPU affinity is a feature built into Windows that lets you
tell the operating system to assign a particular program to
a particular group of CPU cores. Most modern PC CPU chips have
multiple cores, which means that they actually consist of multiple
CPUs within the same chip, each of which can run its own separate
program at the same time that other programs are running on the
other cores.


Windows normally lets all programs
on your system share all of the available cores more or less
equally. CPU affinity lets you override this default equal
rationing by telling Windows to run certain programs only on
certain of the cores.


On a pin cab, we can use this to partition the
PC’s cores into two groups: a «Pinball» group, and «everything
else». Programs in the pinball group run on one set of cores,
and everything else runs on a separate set of cores.
That more or less eliminates competition from other programs
for the «pinball» cores, which lets the pinball software run
more smoothly.


This might not seem all that useful if you have a fast CPU.
VP probably isn’t coming close to 100% CPU saturation on your
machine, after all. But CPU usage percentage isn’t the only
thing that matters: another important factor is the scheduling latency
in the Windows thread scheduler. For video game software like VP, it’s
critical to get access to a CPU the instant there’s work to do. Video
games have to do their computing in real time because they have
to respond to outside events (flipper button presses, for example).
The Windows scheduler is optimized for overall throughput,
so it tries to balance load across programs. As a result, Windows
doesn’t always allow a program run instantly when it has work
to do. Programs normally have to wait their turn. If all of the
CPU cores are busy with other work when VP is ready to render a
frame, Windows might not let VP run in time for the next video
refresh, and this can lead to visible motion artifacts like
«stutter».


So if you’re wondering
why you sometimes get stutter on a machine where VP is only using
8% CPU, this is probably at least part of it, and installing
PinAffinity might help. Affinity settings help by reducing the
number of other programs competing for the cores that VP is
using, so there’s less of that waiting in line for other
programs to finish their work.


Other CPU affinity tools



I wrote PinAffinity because I couldn’t find a suitable existing tool to
recommend in the performance
optimization chapter of my new Pinscape Build Guide. However, that doesn’t
mean there’s nothing else that can do the job. There are some other tools that
can handle this; I just didn’t think they’d make good recommendations because
of drawbacks relating to availability and/or ease of use. With those
caveats in mind, here are some other options….


The best third-party option I’ve been able to find is Process Hacker
version 3. The name makes me rather queasy about setting it loose with Administrator
privileges, but setting aside the unfortunate name, it’s a solid open-source Task
Manager replacement, with features that exceed the
venerable Process Explorer from SysInternals. The key thing
that Ph4 can do, relevant to our purposes here, is to store
affinity settings persistently by process name, and apply them automatically to
new processes as they’re launched, just like PinAffinity does.
Unfortunately, Ph4 is cumbersome for this use because it doesn’t
have a global default affinity setting; you have to go
through all of the running processes to do the initial setup,
which is tedious to say the least.
Note that you need version 3 for the persistent affinity
feature; version 2 doesn’t have it. And version 3 isn’t
officially released as of this writing — it appears to be
a very long-running work in progress. It’s somewhat hidden on the Process
Hacker web sites, too; you have to find the «Nightly Build»
section, which is only linked from the «Download» page. The
whole Web site is said to be in beta, so I don’t want to offer
any deep links into it lest they be reorganized away by the
time you read this.


Another option is a freeware tool called
PriFinitty [sic on the doubled «t»]. You’ll notice I’m not
hyperlinking to it, which we’ll come to shortly. I was
actually using this on my own cab for a long time, and I
would have recommended it despite its cumbersome
user interface, but it turns out I can’t, because it’s no longer available.
PriFinitty was free but closed-source, and the developer seems
to have lost interest in it a long time ago and erased all
official traces from the Web. Thus the lack of a hyperlink.
There are some vestigial copies that people have cached on other
sites, so you might still be able to find an old version with
a little googling. But I don’t think there’s much point given
the uncertain provenance of the cached copies, the lack of
ongoing maintenance, and the fact that it doesn’t seem to work
at all on Windows versions past Win 7.


Windows Task Manager and SysInternals’ Process Explorer both
have the ability to set CPU affinities for a running process,
but that’s not sufficient for our purposes, because they don’t
let you set affinities persistently. They only let you set
affinities on live process instances. That’s obviously unworkable
for our use case.


taskset(1) — Linux manual page

taskset(1) — Linux manual page


NAME | SYNOPSIS | DESCRIPTION | OPTIONS | USAGE | PERMISSIONS | AUTHORS | COPYRIGHT | SEE ALSO | REPORTING BUGS | AVAILABILITY

TASKSET(1)                    User Commands                   TASKSET(1)

NAME         top

       taskset - set or retrieve a process's CPU affinity

SYNOPSIS         top

       taskset [options] mask command [argument. ..]
       taskset [options] -p [mask] pid

DESCRIPTION         top

       The taskset command is used to set or retrieve the CPU affinity
       of a running process given its pid, or to launch a new command
       with a given CPU affinity. CPU affinity is a scheduler property
       that "bonds" a process to a given set of CPUs on the system. The
       Linux scheduler will honor the given CPU affinity and the process
       will not run on any other CPUs. Note that the Linux scheduler
       also supports natural CPU affinity: the scheduler attempts to
       keep processes on the same CPU as long as practical for
       performance reasons. Therefore, forcing a specific CPU affinity
       is useful only in certain applications.
       The CPU affinity is represented as a bitmask, with the lowest
       order bit corresponding to the first logical CPU and the highest
       order bit corresponding to the last logical CPU.  Not all CPUs may
       exist on a given system but a mask may specify more CPUs than are
       present. A retrieved mask will reflect only the bits that
       correspond to CPUs physically on the system. If an invalid mask
       is given (i.e., one that corresponds to no valid CPUs on the
       current system) an error is returned. The masks may be specified
       in hexadecimal (with or without a leading "0x"), or as a CPU list
       with the --cpu-list option. For example,
       0x00000001
           is processor #0,
       0x00000003
           is processors #0 and #1,
       0xFFFFFFFF
           is processors #0 through #31,
       32
           is processors #1, #4, and #5,
       --cpu-list 0-2,6
           is processors #0, #1, #2, and #6.
       --cpu-list 0-10:2
           is processors #0, #2, #4, #6, #8 and #10. The suffix ":N"
           specifies stride in the range, for example 0-10:3 is
           interpreted as 0,3,6,9 list. 
       When taskset returns, it is guaranteed that the given program has
       been scheduled to a legal CPU.

OPTIONS         top

       -a, --all-tasks
           Set or retrieve the CPU affinity of all the tasks (threads)
           for a given PID.
       -c, --cpu-list
           Interpret mask as numerical list of processors instead of a
           bitmask. Numbers are separated by commas and may include
           ranges. For example: 0,5,8-11.
       -p, --pid
           Operate on an existing PID and do not launch a new task.
       -V, --version
           Display version information and exit.
       -h, --help
           Display help text and exit.

USAGE         top

       The default behavior is to run a new command with a given
       affinity mask:
           taskset mask command [arguments]
       You can also retrieve the CPU affinity of an existing task:
           taskset -p pid
       Or set it:
           taskset -p mask pid

PERMISSIONS         top

       A user can change the CPU affinity of a process belonging to the
       same user.  A user must possess CAP_SYS_NICE to change the CPU
       affinity of a process belonging to another user. A user can
       retrieve the affinity mask of any process.

AUTHORS         top

       Written by Robert M. Love.

COPYRIGHT         top

       Copyright © 2004 Robert M. Love. This is free software; see the
       source for copying conditions. There is NO warranty; not even for
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

SEE ALSO         top

       chrt(1), nice(1), renice(1), sched_getaffinity(2),
       sched_setaffinity(2)
       See sched(7) for a description of the Linux scheduling scheme.

REPORTING BUGS         top

       For bug reports, use the issue tracker at
       https://github.com/karelzak/util-linux/issues.

AVAILABILITY         top

       The taskset command is part of the util-linux package which can
       be downloaded from Linux Kernel Archive
       <https://www. kernel.org/pub/linux/utils/util-linux/>. This page
       is part of the util-linux (a random collection of Linux
       utilities) project. Information about the project can be found at
       ⟨https://www.kernel.org/pub/linux/utils/util-linux/⟩. If you have
       a bug report for this manual page, send it to
       [email protected]. This page was obtained from the
       project's upstream Git repository
       ⟨git://git.kernel.org/pub/scm/utils/util-linux/util-linux.git⟩ on
       2021-08-27. (At that time, the date of the most recent commit
       that was found in the repository was 2021-08-24.) If you discover
       any rendering problems in this HTML version of the page, or you
       believe there is a better or more up-to-date source for the page,
       or you have corrections or improvements to the information in
       this COLOPHON (which is not part of the original manual page),
       send a mail to [email protected]
util-linux 2.37.85-637cc       2021-04-02                     TASKSET(1)

Pages that refer to this page:
chrt(1), 
uclampset(1), 
sched_setaffinity(2), 
cpuset(7), 
sched(7), 
migratepages(8)




How to set process affinity in Linux

Skip to content

Process affinity (also known as CPU pinning) is the process of assigning running programs to a single thread (virtual core) instead of allowing it to work on all CPU threads. Setting process affinity is an advantage because it allows users to decide exactly how much resources a program uses.

In this guide, we’ll walk you through how to match running programs on your Linux PC. We will also look at other ways to restrict system resources by programs in your Linux OS.

Finding information about your processor

Before we can get into how to bind individual programs to specific threads, we need to figure out how many threads are available on your Linux system. There are several ways to find out this information. We will consider two ways.

The first way to find out how many threads your processor has is to use the nproc command. This command lists the raw number of processors available for use by your Linux system.

Programs for Windows, mobile applications, games — EVERYTHING is FREE, in our closed telegram channel — Subscribe 🙂

To run nproc To determine how many streams you have available, you need to open a terminal window. To open a terminal window, press Ctrl + Alt + T or Ctrl + Shift + T on your keyboard.

In an open terminal window, run nproc .

 nproc 

You will notice after running the command; a number will appear in the tooltip. The number is the number of threads (virtual cores) on your Linux system. To save this information for the future, do the following.

 nproc >> ~/cpu-count.txt 

If nproc is just not enough information for you, there is a better command line tool that gives you much more information about your CPU threads. This is called CPU Info. Here’s how to install it.

First, make sure you have a terminal window open. Then enter the installation commands below, as appropriate for the Linux operating system you are using.

Ubuntu

On Ubuntu Linux you can install CPU Info with Apartment package manager command below.

 sudo apt install cpuinfo 
Debian

Those using Debian Linux can get the processor information and run it with the following apt-get command.

 sudo apt-get install cpuinfo 
Arch Linux

Arch Linux can easily install the CPU Info tool on Linux using the following Pacman command in a terminal window.

 sudo pacman -S python-py-cpuinfo 
Fedora

Are you using Fedora Linux? Get information about the processor by running the following Dnf command in a terminal.

 sudo dnf install python3-cpuinfo 
OpenSUSE

OpenSUSE Linux user? Get processor information and work with Zypper command below.

 sudo zypper install python3-py-cpuinfo 

Once the CPU Info program is set up on your Linux PC, it’s time to use it to get information about the processor so we can determine exactly how many threads to work with.

Using the processor information command below, get your processor information.

Note: you may need to run cpuinfo sooner than cpuinfo if on Arch Linux, Fedora or OpenSUSE Linux.

 cpu-info 

After executing the command, you will see both the number of cores and the logical number. The logical counter is the number of threads you need to work with. Logical information is the most important in this guide. Feel free to save the processor information to a text file by running the following command.

 cpu-info >> ~/cpu-count.txt 

How to set process affinity on Linux

To set process affinity on your Linux PC, you will need to use the built-in Taskset program. Open a terminal window by pressing Ctrl + Alt + T or Ctrl + Shift + T on your keyboard. Then follow the step by step instructions below to learn how to bind a running process.

Step 1: Find the process ID of the running program by executing pidof followed by the application name. For example, to find the process ID of Thunderbird, you would run the example command below.

 pidof thunderbird 

Step 2: Notice the process ID of the application. Then connect it to the next task set example command.

Note: you must change the thread_number to the CPU thread you want to put the program on. You should also change the process_id to the process id found with pidof that you would like to get close to task set .

 sudo taskset -cp thread_number process_id & 

Would you like to test your newly joined program? Run taskset -p on the program’s process ID to confirm that it is running on the CPU thread you specified in step 2.

 taskset -p process_id 

Learn more about Taskset

you will need to read the manual. Run task set human command.

 man taskset 

running man taskset The team will present you with a detailed guide to the Taskset application. Review it, it will help you understand how the application works. When finished, press q to quit smoking.

Windows apps, mobile apps, games — EVERYTHING FOR FREE, in our private telegram channel — Subscribe 🙂

Similar posts

Scroll up

raster analysis tools and their uses.

Licensing

Your portal administrator must configure your ArcGIS Enterprise deployment for raster analysis so that you can perform raster analysis. Additionally, you will need the following rights:

  • Create, update, and delete resources
  • Publish image services

If the raster analysis tools are not available, check with the portal administrator if it is configured for raster analysis and if you have the required rights.

Accessing the tools

To open and use the raster analysis tools in Map Viewer Classic, follow these steps:

  1. In Map Viewer Classic, open a web map containing the raster layer or layers you want to analyze.
  2. Click the Analyze button on the map menu.
  3. In the Analyze panel, click Raster Analysis.

If you don’t see the Analyze button or the Raster Analysis tab in Map Viewer Classic, contact your portal administrator. The portal may not be configured to work with ArcGIS Image Server, or you may not have access to these tools. If you do not have the appropriate rights to run the tools, then they will not be displayed.

Introduction to the Raster Analysis panel

The Raster Analysis panel is shown below. This panel has several categories, each containing tools. To view tools within a category, click the button to the left of the category to expand or collapse it.

Open the Raster Function Editor window. See Raster Function Editor for details.

Open the Raster Function Template Browser and Web User Tools window. See Raster Functions for details.

Open the Analysis Environment Options dialog box.

Return to the Run Analysis panel.

View category help.

Expand a category to see the tools it contains.

Using Analysis Environment Options

The Analysis Environment Options button is used to access raster and image processing options that apply to all raster analysis tools. When you open the Analysis Environment Options window in the Raster Analysis panel, you can change the settings for the output coordinate system, processing extent, anchor raster, cell size, and mask.

Analysis environment Description
Output coordinate system

Specifies the coordinate system of the resulting image layer.

The following options are available:

  • Same as input — the analysis results will have the same coordinate system as the input. Used by default.
  • As specified — the analysis results will have the coordinate system you choose. If this option is selected, click the globe button to select the desired coordinate system from a list of known coordinate systems, or add the coordinate system WKID in the corresponding box.
  • Layer — analysis results will be in the same coordinate system as the existing layer selected on the web map.
Extent

Specifies the extent or boundary to apply when performing the analysis. All pixels or cells that are entirely within or intersect the specified extent will be included in the analysis.

The following options are available:

  • Default — The extent specified by the tool is used in the analysis.
  • As Specified — The extent is specified by the coordinates you specify.
  • Layer — The extent of the analysis will match the spatial extent of the existing layer selected in the web map.
Anchor Raster

Adjusts the extent of the output raster layer to match the cell alignment of the specified Anchor raster raster.

Cell Size

Specifies the cell size or resolution used when generating the output raster layer from raster analysis. The default output resolution is determined based on the largest cell size of the input raster layers.

The following options are available:

  • Minimum of all inputs — The smallest cell size of all input layers.
  • Maximum of all output — The largest cell size of all input layers. Used by default.
  • As Specified — Enter a numerical value for the cell size. If this option is selected, the default value will be 1.
  • — sets the cell size of the selected raster layer.
Mask

Specifies the layer that will be used to define the region of interest for the analysis. Only cells that fall inside the analysis mask will participate in the analysis.

  • The mask can be a raster or vector layer.
  • If the analysis mask is a raster, all cells containing values ​​will be considered to define the mask. Cells in the mask raster with NoData values ​​will be treated as being outside the mask, and the corresponding cells will be set to NoData in the analysis results layer.
  • If the analysis mask is a feature layer, it will be internally converted to a raster on execution. For this reason, you should make sure that the Cell Size and Anchor Raster are set appropriately for your analysis.

When you open the Analysis Environment window from the raster analysis toolbar, you can see additional analysis environments. Some raster analysis tools take into account several analysis environments, listed in the table below. Because not all tools are aware of all environments, they are accessed from individual tools and not from the Raster Analysis pane.

Analysis environment Description
Recalculation method

Define how to interpolate the transformation of the raster dataset pixel values. This environment setting is used if the input and output data are not aligned with each other, if the pixel size changes, if the data is shifted, or if all of these situations occur at the same time.

The following options are available:

  • Nearest Neighbor — Used primarily for discrete data such as land use classification, as cell values ​​will not change. This method is also suitable for continuous data when you want to keep the original reflectance values ​​in the image for accurate multispectral analysis. This is the most efficient in terms of processing time, but may lead to small errors in the positioning of the output image. The output image may be shifted by half a pixel, resulting in a tearing and jagged appearance.
  • Bilinear interpolation — This method is best suited for continuous data. Performs bilinear interpolation and determines a new cell value based on the weighted average distance between the centers of the four closest cells in the input raster. Creates an output image that looks smoother on the surface than the Nearest Neighbor method, but changes the specular values, resulting in blurry or loss of resolution in the image.
  • Cubic convolution — recommended for continuous data. This method performs a cubic convolution and determines a new cell value based on a smooth curve drawn through the 16 nearest cell centers of the input raster. The result is geometrically less distorted than a Nearest Neighbor raster and clearer than Bilinear interpolation. In some cases, the resulting output cell values ​​may be outside the range of input cell values. If this is not acceptable, use the Bilinear Interpolation method. Cubic convolution is more resource intensive and takes longer to process.
Processor Type

Select whether to run the analysis using the CPU or GPU. If the Processor Type environment setting is empty, the tool uses the CPU to process data.

The following options are available:

  • CPU – processing will be performed on the central processor. CPU processing can be shared in parallel across multiple cores and instances, and controlled by the Parallel Processing Ratio.
  • GPU — processing will be performed on the GPU. Graphics processing units (GPUs) are efficient at graphics and image processing, where their highly parallel structure makes them efficient at repetitive processing of large blocks of data. Raster analysis tools that respect this environment setting can distribute their jobs across GPU instances on multiple raster analysis server machines, and will be controlled by the Parallel Processing Ratio.
Processing workflow retry interval

Specifies how many image sections will be processed before restarting workflows to prevent potential crashes in long running processes. The default value is 0.

Parallel Processing Ratio

Specify the number of raster processing service instances that can be used to process data.

If the tool does not respect the Processor Type, or if the Processor Type environment setting is set to CPU, the Parallel Processing Ratio setting controls raster processing service (CPU) instances. When Processor Type is set to GPU, the Parallel Ratio environment setting controls the set of GPU instances to process the raster.

By setting the Parallel Processing Ratio, you can query the number of parallel processing instances that the Raster Analytics Image Server uses to process a single raster analysis task. However, if the total number of parallel processes exceeds the maximum number of raster service (CPU or GPU) instances, additional parallel processes will be queued.

If Parallel Processing Ratio is not specified, which is the default value, the tool will use 80 percent of the maximum number of raster processing service instances. You can specify an integer or a percentage as the parallel processing factor.

Number of retries on failures

Specifies the number of retries for the same workflow if the processing of a particular job randomly fails. The default value is 0.

Working with the toolbar

Click the tool icon to open the raster analysis toolbar. The panel for this tool opens, as shown below for the Track Vegetation tool.

Open the Analysis Environment Options dialog box.

Close the toolbar without starting the analysis and return to the Raster Analysis panel.

Getting help for a parameter.

The analysis result is saved in Resources under this name.

You can specify a folder in Resources to save the results.

If this option is selected, only the data that is currently visible on the map will be used in the analysis.

Each tool has its own set of parameters. You can always open help for a parameter by clicking the help button next to the parameter, as shown above. All tools have a Result Layer Name parameter in which the analysis results are written. You can change this name or use the default value.

Use Current Map Extent

It’s a good idea to always check the Use Current Map Extent option, and that you’ve zoomed the map to the desired area of ​​analysis. This will limit the number of images or raster pixels that the tool needs to process during analysis. If you disable the Use current map extent option, the entire input imagery layer will potentially be analyzed.


Feedback on this section?

Pairwise Integration (Analysis)—ArcGIS Pro | Documentation

Back to Top

In This Section
  1. At a Glance
  2. Illustration
  3. Usage
  4. Settings
  5. Environment Settings
  6. 0 License Information 9018

    Summary

    Analyzes the location of feature vertex coordinates in one or more feature classes. Those that fall within the specified distance from each other are considered to represent the same location and are assigned the same coordinate value (i. e., they «collapse»). The tool also adds new vertices when feature vertices fall within the edge’s x,y tolerance or where feature segments intersect.

    Pairwise integration performs the following processing tasks:

    • Vertices that are within the x,y tolerance of each other will receive the same coordinate location.
    • When a vertex of one feature is within x,y tolerance of an edge of another feature, a new vertex will be added to the edge.
    • When line segments intersect, a new vertex will be added at the intersection point for each intersecting feature.

    There is an alternative tool for integrating vector data. For more information, see the documentation for the Integrate Tool.

    Illustration

    Usage

      Note:

      This tool modifies input values. For more information about strategies to prevent unwanted data changes, see Tools that modify or update input data.

    • If input features are selected, this tool will only run on those selected features.

    • This tool does the same as topology, that is, it moves features within an x,y tolerance and adds vertices where the features intersect. Consider using topology to perform this type of operation, as topology allows you to set rules and conditions for feature relationships.

      Use the Pairwise Integration tool instead of topology in the following cases:

      • You do not need to specify rules for moving features, only that all features merge within the specified tolerance.
      • You want the lines to have vertices where they intersect
      • You are working with non-geodatabase features, such as shapefiles, or features from other geodatabases (the features in the topology must be from the same feature dataset).
    • Integration can solve many possible data problems. The problems of moving with very small mismatches, automatically removing double segments and refining coordinates along boundary lines can be solved.

    • XY tolerance is not recommended. If it is not set, the tool will check the spatial reference of the input feature class to determine the x,y tolerance to be used in the integration. The spatial reference of the input data must be set to default x,y resolution and x,y tolerance. For more information about spatial references, see Spatial reference properties.

      The XY Tolerance parameter is not intended to generalize feature geometry, but is intended to match lines and polygon boundaries in the context of a properly defined spatial reference of the input feature class. Setting the XY Tolerance to something other than the default value for the input spatial reference can cause features to move too much or too little, resulting in geometry problems. If the correct spatial reference properties are used, running the Integrate tool can minimize the amount of movement in the data during subsequent topological operations (such as overlay and merge).

      The XY tolerance parameter value is critical. We recommend that you set the input feature class’s spatial reference properties to default values ​​and enable Pairwise Integration by default in the input feature class’s properties. For more information about cluster processing, see Cluster processing.

    • The Pairwise Integration tool only accepts simple feature classes (points, multipoints, lines, or polygons) as input.

    • To undo changes to input features, use the Pairwise Integration tool in an edit session.

    • The output of this tool is a multivalued derived output. To use the output of this tool in another tool, use its input directly and set the output as a precondition for the other tool.

      Learn more about setting precondition

    • The tool respects the environment parameter Parallel processing factor. If the environment setting is not set (the default), or set to 100, full parallel processing will be enabled and the tool will attempt to use all logical processor cores on the computer. Setting the environment options to 0 will disable parallel processing. Setting the ratio to a value between 1 and 99 will cause the tool to determine the percentage of logical cores to use using the formula (Parallel Ratio / 100 * Logical Cores), rounding the result to the nearest whole number. If the result is 0 or 1, parallel processing will not be enabled.

    Parameters

    Signature Description Data type
    Inputs

    Grace of spatial objects that will be integrated. If the spacing between elements is small compared to the tolerance, the vertices or points will be clustered (moved to match).

    Value Table

    XY Tolerance

    (Optional)

    A distance that specifies the range within which the object’s vertices coincide. To reduce unexpected vertex movements, the x,y tolerance should be small enough. If the x,y tolerance parameter is not specified, the value will be taken from the first input dataset.

    Caution:

    Changing the value of this parameter may cause a crash or unexpected results. It is recommended that you do not change this setting. It has been removed from view in the tool’s dialog box. The default property of the spatial reference is the x,y tolerance of the input feature class.

    in_features

    [in_features,. ..]

    The feature classes to be integrated. If the spacing between elements is small compared to the tolerance, the vertices or points will be clustered (moved to match).

    Value Table

    cluster_tolerance

    (Optional)

    A distance that defines the range within which the object’s vertices coincide. To reduce unexpected vertex movements, the x,y tolerance should be small enough. If the x,y tolerance parameter is not specified, the value will be taken from the first input dataset.

    Caution:

    Changing the value of this parameter may cause a crash or unexpected results. It is recommended that you do not change this setting. It has been removed from view in the tool’s dialog box. The default property of the spatial reference is the x,y tolerance of the input feature class.

    Linear Unit

    Derived output data

    Name Description Data type
    out_features

    Updated input features

    Feature layer

    Sample code

    PairwiseIntegrate example 1 (Python window)

    Python window script example for using the PairwiseIntegrate function in direct run mode.

     import arcpy
    arcpy.env.workspace = "C:/data"
    arcpy.CopyFeatures_management("Habitat_Analysis.gdb/vegtype", "C:/output/output.gdb/vegtype")
    arcpy.PairwiseIntegrate_analysis("C:/output/output.gdb/vegtype") 

    PairwiseIntegrate example 2 (standalone script)

    An example Python script to execute the PairwiseIntegrate function offline.

     # Description: Run Integrate on a feature class
    # Import system modules
    import arcpy
    # Set environment settings
    arcpy.env.workspace = "C:/data/Habitat_Analysis.