numactl is a utility which can be used to control NUMA policy for processes or shared memory. NUMA (stands for Non-Uniform Memory Access) is a memory architecture in which a given CPU core has variable access speeds to different regions of memory. Typically, each core will have a region of memory attached to it directly which it can access quickly (local memory), while access to the rest of the memory is slower (non-local memory). This is in contrast to symmetric multiprocessing or SMP architecture in which each processor is connected to the whole of main memory at uniform speed. For some programs understanding the structure of NUMA memory and accessing it correctly may be crucial for obtaining good performance. Good performance usually depends on making sure that the data needed by a process is in local memory as often as possible.
The clusters on SHARCNET which have NUMA architecture include hound (nodes 10 to 19) and orca. Compute Canada graham and cedar clusters also have NUMA architecture.
For the full listing of options to numactl, execute:
To obtain information about NUMA memory architecture, you can log into a compute node and execute:
The output of these commands on orca is:
[ppomorsk@orc10:~] numactl --hardware available: 4 nodes (0-3) node 0 size: 8058 MB node 0 free: 7656 MB node 1 size: 8080 MB node 1 free: 7930 MB node 2 size: 8080 MB node 2 free: 8051 MB node 3 size: 8080 MB node 3 free: 8062 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10
[ppomorsk@orc10:~] numactl --show policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 cpubind: 0 1 2 3 nodebind: 0 1 2 3 membind: 0 1 2 3
This indicates that each orca compute node has 24 processors, arranged into 4 groups (memory nodes) of 6, with each group of 6 having access to local memory region of 8 GB.
You can use numactl to control how your program uses memory quite precisely, controlling which CPUs are used to execute a program and which memory nodes are used. Quite often, simply using numactl to require that the program uses only local memory can result in much improved performance. This can be achieved by using numactl with flag -l
numactl -l <myprogram>
The tradeoff is that the program can use only local memory, which is less than the total memory, and will fail if it tries to allocate more memory when the local memory is already full.
This kind of control should be used when requesting a full node for your job. Requesting only a part of the node will limit access to the cores of the node, so that will limit numactl commands as well.