From Documentation
Jump to: navigation, search

Graham1.png Graham3.png Graham2.png

IMPORTANT: to login to Graham, you have to use your Compute Canada credentials (login name and password), not your SHARCNET credentials!

Graham is the largest and by far the most powerful cluster among the current SHARCNET fleet of supercomputers. Graham is also known as GP3, and is a part of the major renewal of academic supercomputers in Canada in 2017, with the other new systems being Arbutus (GP1) in the University of Victoria, Cedar (GP2) in Simon Fraser University, and Niagara (LP) in the University of Toronto. A SHARCNET system notice will be sent to all users when Graham is ready for access. SHARCNET users will be able to login to this system at graham.computecanada.ca with their Compute Canada username and password. In the meantime several resources have been put into place for users to familiarise themselves with the usage of this new system.

General information about migrating work from existing systems to the new national general purpose systems is available on the Compute Canada Wiki page at:

https://docs.computecanada.ca/wiki/Code_and_job_migration_from_legacy_systems

Properties of the system including its address, node composition and file systems, etc. can be found on the Compute Canada Wiki page at:

https://docs.computecanada.ca/wiki/Graham

Instructions for running jobs via the Slurm scheduler are available on the Compute Canada Wiki page at:

https://docs.computecanada.ca/wiki/Running_jobs

List of software packages available on the system on the Compute Canada Wiki page at:

https://docs.computecanada.ca/wiki/Available_software

A recent SHARCNET General Interest Webinar describes what to expect from the new systems and demonstrates some important usage differences from other SHARCNET systems. A recording of this webinar is available at the SHARCNET YouTube channel:

https://www.youtube.com/watch?v=VYaLlQ4Q8pI

Short introductory video recordings covering different aspects of the new national general purpose clusters are available as a playlist at the Compute Canada YouTube channel:

https://www.youtube.com/channel/UC2f3cwviToj-mazutBNhzFw

Once that the Graham system is available SHARCNET staff will be presenting daily demonstrations of basic work flow on Graham. Following a brief usage demonstration the support staff will stay online for the remainder of the hour to discuss access and usage topics relating to the Graham system. These live demonstrations/discussions will be posted with other SHARCNET events on the calendar at:

https://www.sharcnet.ca/my/news/calendar


For support request relating to the Graham system email support@computecanada.ca or help@sharcnet.ca .


Quick facts

  • Number of CPU cores: 33448
  • Number of nodes: 1043
  • Total memory (RAM): 149 TB (4.6 GB/core on average)
  • Number of NVIDIA P100 GPUs: 320
  • Networking: EDR (cpu nodes) and FDR (GPU nodes) InfiniBand

Default account

Every Graham user gets at least one - default - account on the Graham's scheduler - def-userid (each user has a different userid). Users with RAC allocation(s) also get additional account(s). Each job script (and each salloc command) has to include the intended account name in the "-A account" argument.

To make life easier, you can add the following lines at the end of your .bashrc file:

export SLURM_ACCOUNT=def-$USER
export SBATCH_ACCOUNT=$SLURM_ACCOUNT
export SALLOC_ACCOUNT=$SLURM_ACCOUNT

After that you have to log out and log in. From then on you will only have to use the "-A account" argument in your job scripts when you use a non-default account (say, a RAC account).

Test jobs

Graham has a small number of interactive nodes reserved for short (<12 hours) jobs. They should only be used for testing / debugging your code.

To access these nodes, use the following commands (instead of sbatch) when submitting the job:

$ srun -t 0:10:0 -n 1 -A account -o out.log ./serial_code &
$ srun -t 0:10:0 -n 8 -A account -o out.log ./mpi_code &
$ OMP_NUM_THREADS=8 srun -t 0:10:0 -c 8 -A account -o out.log ./multithreaded_code &

You can add other srun (sbatch) arguments, like --mem-per-cpu etc.

Useful links