From Documentation
Jump to: navigation, search
saw
Hostname: saw.sharcnet.ca
Target usage: Low latency parallel jobs
Using multiples of 8 processes is optimal
System information: see saw system page in web portal
System status: see saw status page
Real time system data: see Ganglia monitoring page
Full list of SHARCNET systems


System Overview

Spec Info Remarks
Cores (CPU) Intel Xeon 2.83GHz
Cores/node 8 4 cores/socket
Memory/node 16 GB
Interconnect DDR InfiniBand
Storage 130 TB
OS CentOS 6.x
Max. Jobs 5000

For system notice/history, please visit the Saw system page in the SHARCNET web portal.

System Access and User Environment

Login nodes

[isaac@saw-login1:~] ulimit -aH
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128568
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 8192
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) 3600
max user processes              (-u) 100
virtual memory          (kbytes, -v) 2000000
file locks                      (-x) unlimited

Please note that login nodes are only to be used for short computations that do not require a lot of resources. To ensure this, some of the resource limits on login nodes have been set to low values. If you want to see your limits, please execute:

ulimit -a

In order to change your limits, please do:

ulimit -v 2000000

which sets the virtual memory to 2GB.

Development nodes

# of development nodes ==> 6
hostnames ==> saw-dev1, saw-dev2, saw-dev3, saw-dev4, saw-dev5, saw-dev6
Job submission scheduler ==> N/A
Max. running time ==> 3 cpu days

Logging In

First we log into saw, then we log into a development node:

[snuser@localhost ~]$ ssh saw.sharcnet.ca 

Welcome to the SHARCNET cluster Saw.... <snip!>

[snuser@saw-login1:~]$ ssh saw-dev1

Welcome to the SHARCNET cluster Saw. ... <snip!>

Compiling

We compile on the development node in exactly the same way we compile on the login node. The same modules are available, and the same commands for compiling are used (cc, mpicc, f90, mpif90 etc.). Please refer to our Knowledge Base for more details.

If compiling a large code, doing it in the /tmp directory will make it go much faster. Just remember to move files from /tmp to somewhere in /home or /work once done, as the /tmp directory is regularly cleaned out. The /tmp directory is local to each development node, in other words each node has its own /tmp not visible from other nodes.

Running a parallel job

Let's say you compiled an MPI parallel executable called test.x located in your current directory. You can compile this in the usual way. To run this with two processes, on development nodes saw-dev1,saw-dev2, you would execute:

`which mpirun` -n 2 -host saw-dev1,saw-dev2 ./test.x

This command should be launched from the development node.

Breaking this down:

  • `which mpirun`
    • specifies the full path to the mpirun binary you presently have loaded via a module. It will not be found on remote nodes due to the way remote shells are implemented at SHARCNET (no environment is set up).
  • -n 2
    • specifies how many MPI processes to start (round-robin distribution across nodes by default)
  • -host saw-dev1,saw-dev2
    • specifies which development nodes the job should use, as a comma separated list. One may also set up a hostfile, see man mpirun for examples and further MPI process layout possibilities.
  • ./test.x
    • the name of your program, again, using a fully specified path so that it is found on the remote nodes

Note: be careful to never specify in the -host option any other nodes except for the 6 development nodes.

Checking to see how busy the development nodes are

One can look at how much memory is allocated and how busy the node is with the free and uptime commands:

snuser@saw332:~] free
            total       used       free     shared    buffers     cached
Mem:      32958080     768048   32190032          0       1304     276276
-/+ buffers/cache:     490468   32467612
Swap:     31999988          0   31999988
[snuser@saw332:~] uptime
 16:03:33 up 5 days,  1:22,  1 user,  load average: 0.07, 0.02, 0.00

The free command shows a number of values, the important one is the value in the "free" column listed on the row beginning with "-/+ buffers/cache" (at present 32467612). This is the amount of free memory, including memory that is temporarily set aside for buffers and caching and which can be evicted (in other words, it is a better measure of how much is available for processes to use). If this value is significantly less than the value listed in the "total" column for the "Mem:" row, then the node is using a lot of memory and may not have sufficient memory for your purposes. To inspect the free memory on all of the development nodes, one can run:

 pdsh -w saw-dev[1-6] free | grep 'buffers\/' | awk '{print $1,$NF}' | sort -n -r -k 2
[isaac@saw-login1:~] pdsh -w saw-dev[1-6] free | grep 'buffers\/' | awk '{print $1,$NF}' | sort -n -r -k 2
saw-dev4: 15744252
saw-dev3: 15719876
saw-dev6: 15659388
saw-dev1: 15658500
saw-dev5: 15646976
saw-dev2: 15379180

The uptime command shows a number of values, the important ones are the "load average:" numbers. These numbers show how many processes are in the run queue (state R) or waiting for disk I/O (state D) averaged over 1, 5, and 15 minutes. If these numbers are close to, or more than, the number of processing cores (4 in the case of the present kraken development nodes), then you should probably pick a different node to work on. To inspect the 15 minute load average on all the development nodes, one can run the following command, and then pick the node with the least load to use:

 pdsh -w saw-dev[1-4] uptime | awk '{print $1,$NF}' | sort -n -k 2
[isaac@saw-login1:~] pdsh -w saw-dev[1-6] uptime | awk '{print $1,$NF}' | sort -n -k 2
saw-dev1: 0.00
saw-dev2: 0.00
saw-dev5: 0.00
saw-dev3: 0.01
saw-dev6: 0.01
saw-dev4: 0.02

Storage

There is 130 TB in total.

Submitting Jobs

In general users will experience the best performance on saw by ensuring that their jobs use whole nodes. Some measurements have shown that when MPI jobs are sharing nodes with other jobs that they slow down depending on resource contention.

This means that in general MPI jobs should use multiples of 8 cores. Threaded jobs can run up to 8 cores as saw has all 8 cores Xeon nodes.

When submitting MPI jobs, one should use the -N and -n flags to ensure a job is schedule to full nodes. For example, if your program is going to use 64 processes, one would submit it as:

sqsub -q mpi -n 64 -N 8 <...>

It is important to include -N 8 to ensure the job is not scattered on nodes where other user's jobs are running.