Threading

The smallest unit of executable program is known as a thread.

We can adjust the number of MPI processes and threads, but the product of the two should ideally be the total number of cores. For example, with 8 cores, we can have 2 processes and 4 threads per process or 1 process with 8 threads. In NEST, we recommend only having one thread per core.

We can control the number and placement of threads with programs that implement standards such as OpenMP.

For a detailed investigation, we recommend reading Kurth et al. 2022 [1].

Pinning threads

Pinning threads allows you to control the distribution of threads across available cores on your system, and is particularly useful in high performance computing (HPC) systems.

Allowing threads to move can be beneficial in some cases. But when threads move, the data that is to be processed needs to move too. With NEST, each thread gets allocated a specific set of data objects to work with during simulation (due to the round robin distribution). This means that when a thread moves, it cannot perform any computation until its specific data gets to the right place. This is called cache misses. For this reason, pinning threads typically decreases run time. See our overview handling threads with virtual processes.

There are different types of pinning schemes, and the optimal scheme will depend on your script. Here we show two different example schemes.

Sequential pinning scheme

../_images/CPUs-lin-lin.gif

Figure 39 Sequential placing

In this scheme, the cores of 1 CPU are filled before going to next

Setting to use for this case: export OMP_PROC_BIND = close

Distant pinning scheme

../_images/CPUs-lin-sparse.gif

Figure 40 Distant placing

Maximizes distance between threads in hardware

Setting to use for this case: export OMP_PROC_BIND = spread

Table of OpenMP settings

Table 2 OpenMP settings

Setting

Description

export OMP_NUM_THREADS=#CPUSPERTASK#

variable telling OpenMP how many threads are used on a MPI process

export OMP_PROC_BIND=true

no movement of threads between OpenMP threads and OpenMP places

export OMP_PROC_BIND=close/spread

no movement of threads between OpenMP threads and OpenMP places and OpenMP places are ‘close’ in a hardware sense

export OMP_PLACES=threads/cores

each OpenMP place corresponds to a hardware thread/core

export OMP_PLACES="{a : b : c}"

OpenMP places are a, a+b, a+2c, … a+nc=b (numbering usually relates to cores/hardware threads)

export OMP_DISPLAY_ENV=true

display OpenMP variables

Note

Using `python` on HPC systems might lead to inconsistencies in multi-threading libraries resulting in a degredation of performance. For instance, depending on the installation `numpy` uses the multi-threading library provided by the MKL. To resolve this one needs to set `export MKL_THREADING_LAYER=GNU` in order to pass the OpenMP settings correctly.

See also

For general details on pinning in HPC systems see the HPC wiki article.

References