From Documentation
Jump to: navigation, search

GNU parallel is a shell tool which can be used for executing serial jobs in parallel within one compute node. It will be helpful if a user have many serial (also can be multi-threaded) jobs running in one whole-node job submission on cluster Graham, which is favouring full node jobs.

  • When using GNU parallel for a publication please cite: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.

Also consider our more flexible in-house software META.

Basic usage for different programs

If user has many different programs to run:

parallel -a program_input_file -j 32 '{}'

program_input_file has contents:

code1
code2
code3
...

-j 32 flag will launch 32 jobs at the same time (Graham CPU nodes have 32 CPU cores each node). GNU parallel will launch a new program after one is finished.

Same program with different input parameters

If user wants to run one program with multiple input parameters:

parallel -a param_input_file --colsep ' ' -j 32 'program {1} {2} {3}'

param_input_file has contents:

--param1=a --param2=b --param3=c
--param1=d --param2=e --param3=f
--param1=g --param2=h --param3=i
...

--colsep ' ' (there is a space between ' ' ) will separate parameters and put it into program one by one. The above case takes 3 parameters each time.

Redirecting the outputs

If programs generate outputs to STDOUT/STDERR, user needs to redirect them into files:

parallel -a param_input_file --colsep ' ' -j 32 'program {1} {2} {3} &>{#}.out'

{#} is the job id, starting from 1.

Working with NUMA

User can also run multiple threaded job at a time. numactl command can be used to force the threads to be within the same socket to get the best performance. For example, user wants to run many 8-core jobs, 4 jobs can be run at the same time on a 32-core node:

parallel -a params.input --colsep ' ' -j 4 'OMP_NUM_THREADS=8 numactl -N $(( ({%} -1) % 2 )) program {1} {2} {3} &> {#}.out'

{%} is the slot id which is always 1,2,3,4 when use flag -j 4. $(( ({%} -1) % 2 )) will calculate the socket id for each job. Slot 1,3 will be put on socket 0. Slot 2,4 will be put on socket 1.

Working with GPU

Controlling GPU is similar to NUMA. For example, running two jobs on 2 GPUs at a time:

parallel -a params.input --colsep ' ' -j 2 'CUDA_VISIBLE_DEVICES=$(( {%} - 1 )) program {1} {2} {3} &> {#}.out'

GPU id will be calculated by slot id {%} minus 1, that is 0 or 1.