- 1 What is SHARCNET?
- 2 Suitable uses
- 3 Facilities and resources
- 4 Account and login
- 5 Compiling programs
- 6 File system basics
- 7 User certification
- 8 Scheduling Jobs
- 9 Support
What is SHARCNET?
SHARCNET is a consortium of Canadian academic institutions who share a network of high performance computers. With this infrastructure we enable world-class academic research.
- Member of Compute Canada
- 18 institutions across Ontario
- Over 3,000 Canadian and international users
- Over 20,000 processors
- 150 Tesla GPUs
- 10 Gb links between compute facilities
- Single name sign-up, same home directory everywhere
- Free to use
- The resources are provided to enable HPC and are not intended as a replacement for a researcher's desktop or lab machines
- SHARCNET users can productively conduct HPC research on a variety of SHARCNET systems each optimally designed for specific HPC tasks
Academic HPC Research
- The research can be business-related, but must be done in collaboration with an academic researcher
- Users have access to all systems
- Clusters are designed for certain type of jobs
- Job runs in batch mode (scheduling system) with fairshare
Facilities and resources
Facilities: intended use
This listing only includes conventional clusters that are intended for production work with serial/threaded/MPI applications. For an expanded listing of all systems (including specialty clusters) see Hardware/System_Resources. An explanation of which system to use depending on your application requirements can be found here.
|orca (Capability)||8320||32 GB||InfiniBand||large MPI, threaded(n<=24), devel nodes|
|saw (Capability)||2688||16 GB||InfiniBand||medium/large MPI, threaded(n<=8), devel nodes|
|requin (Capability)||1536||8 GB||Quadrics||medium/large MPI, test queue|
|kraken||2232||4,8,32 GB||Myrinet 2g(GM)||single farming, small/medium MPI/threaded, devel nodes|
|monk||432||48 GB||InfiniBand||GPGPU (108 GPUs)|
|hound||496||128 GB||InfiniBand||large memory, threaded(n<=32)|
- SHARCNET’s web site provides extensive information about deployed systems, software stack and help.
|Systems||"Clusters" "SMP"s, GPGPU, Visualization|
|Operating systems||64-bit Linux (mainly CentOS, HP XC, Fedora)|
|Languages||Fortran, C/C++, Java, Matlab, etc.|
|Unified compilation environment||cc, c++, f77/90, mpicc, mpic++, mpif77, mpif90|
|Key parallel development support||MPI, POSIX threads, OpenMP, CUDA, OpenACC|
|Software modules||select software (and versions) with the 'module' command|
|Batch scheduling||sq: SHARCNET unified batch execution environment|
Account and login
Getting an account
- Apply for An Account Online (How to apply for an account)
Accessing SHARCNET computational systems
Your login credentials are used to access both our web portal and all of our computational systems
- UNIX-based way:
- Login to a system via SSH, you see familiar UNIX environment.
- Edit source code and/or change the input data/configuration file(s).
- Compile source code.
- Submit a program (or many) to batch queuing system.
- Check results later.
You may find more details SSH for Linux and Mac users.
- MS-Windows based way:
- Start your client applications such as putty, winscp and mobaXterm from your desktop.
- Computational tasks and data are bundled as meta data and are sent to the remote compute engines under the hood. Tasks (aka jobs) are scheduled. During this period, your application is waiting…
- Once the computational tasks are complete, results are sent back using your application.
You may find more details SSH for MS-Windows users.
SHARCNET provides a unified compiling environment that chooses the right underlying compiler, options and libraries for you! Use them always unless you know better.
|cc||C||c||cc code.c -o code.exe|
|CC,c++,cxx||C++||.C, .cc, .cpp, .cxx, c++||CC code.cpp -o code.exe|
|f77||Fortran 77||.f, .F .f77||f77 Fcode.f -o Fcode.exe|
|f90/95||Fortran 90/95||.f90, .f95, .F90, .F95||f90 fcode.f90 -o code.exe|
|mpicc||C||c||mpicc mpicode.c -o mpicode.exe|
|mpiCC||C++||C, .cc, .cpp, .cxx, c++||mpiCC mpicode.cc -o mpicode.exe|
|mpif77||Fortran 77||.f .F .f77||mpif77 mpicode.f -o mpicode.exe|
|mpif90/mpif95||Fortran 90/95||.f90, .f95, .F90, .F95||mpif90 mpicode.f90 -o mpicode.exe|
You may take a look at Getting Started with Compiling Code on SHARCNET for more details.
File system basics
|/home||10 GB||none||unified||Source, small configuration files, backed up regularly.|
|/work||1 TB||none||unified||Active data files. auto mounted, convenient|
|/scratch||None||2 months||per-cluster||temporary files, checkpoints, best performance|
|/tmp||None||post-job||per-node||Node-local scratch, caching|
|/freezer/$USER||2 TB||2 years||login nodes||long term data archive|
|Level||Number of CPUs||Maximum Runtime Limit|
Note: If you need resources beyond the limit, contact us to see what options are available (e.g. NRAC).
- Log on to the desired system, then:
- Ensure files are in /scratch/$USER or /work/$USER (Do not run a job out of /home)
- Jobs are submitted using the sqsub command:
sqsub –q queue_name -r run_time [ additional_sq_options ] your_program [ your_args ]
You may find more details in Compiling and Running Programs page
Choosing the right queue
|Job Type||Queue Name||CPUs||Nodes|
|Parallel||mpi||>2||single or multiple nodes|
|Parallel||threaded (system dependent)||8 (saw), 24 (orca), 16 or 32 (hound)||single node|
|GPU||gpu||1 or more||single or multiple nodes|
Note: The gpu queue is only available on systems that have GPUs installed. For a list of those, plus instructions on how to submit to the gpu queue, see GPU Accelerated Computing
You may notice other queues not listed above, such as staff, gaussian, etc. These are special queues with restrictions and are not publicly available.
Using sq commands
In the following set of examples, the first example in each section submits the job to the test queue, the second specifies an output file, and the third specifies both an output and input file:
- Submitting serial jobs
sqsub -r 7d -q serial --test ./simula sqsub -r 7d -q serial -o simula.out ./simula sqsub -r 7d -q serial -o simula.out -i simula.in ./simula
- Submitting parallel jobs (in this case, a 24 cpu job)
sqsub -r 7d -q mpi --test -n 24 ./simula sqsub -r 7d -q mpi -n 24 -o simula.out ./simula sqsub -r 7d -q mpi -n 24 -o simula.out -i simula.in ./simula
- Submitting a parallel job with request on process distribution (24 cpus on 6 nodes)
sqsub -r 7d -q mpi --test -n 24 -N 6 ./simula sqsub -r 7d -q mpi -n 24 -N 6 -o simula.out ./simula sqsub -r 7d -q mpi -n 24 -N 6 -o simula.out -i simula.in ./simula
- sqjobs - list the status of submitted jobs
[isaac@nar316 ~]$ sqjobs jobid user queue state ncpus time command ------ ---- ----- ----- ----- ---- ------------- 136671 isaac mpi R 20 0s ./my_mpi_prog 1060 CPUs total, 30 idle, 1030 busy; 43 jobs running; 16 suspended, 12 queued.
- sqkill - kill a job in the queue that you want to stop
[isaac@nar316 ~]$ sqjobs jobid user queue state ncpus time command ------ ---- ----- ----- ----- ---- ------------- 136672 isaac mpi Q 1 0s ./my_mpi_prog 1060 CPUs total, 50 idle, 1010 busy; 42 jobs running; 16 suspended, 13 queued. [isaac@nar316 ~]$ sqkill 136672 Job <136672> is being terminated
You may find more details about how to monitor your jobs in queue in Monitoring_Jobs page.
Common mistakes to avoid
- Do not run significant programs on login nodes or directly on compute nodes
- Do not specify a maximum job run time of 7 days, or more memory than required for your job
- Do not run unoptimzed code.
- Do not create millions of tiny files, or large amounts(>GB) of uncompressed (e.g. ASCII) output.
- HPC Consultants
- User-facing point of contact
- Analysis of research requirements
- Development support and job performance analysis
- Training and education
- Project and technical computing consultations
- System Administrators
- User accounts.
- System software.
- Hardware and software maintenance.
Where to look for information
- Weekly Online Seminars on every Monday (new user seminars).
- Education Online - general interest seminars, summer schools, slides, examples from past workshops are also available
- System status is available on the web on the [Facilities page. (RSS feeds)
- Mailinglist for specific software packages.