From Documentation
Revision as of 16:48, 6 November 2019 by Ppomorsk (Talk | contribs) (Information for Previous RAC Allocations: removing obsolete)

Jump to: navigation, search
Note: Some of the information on this page is for our legacy systems only. The page is scheduled for an update to make it applicable to Graham.

Different problems require differing degrees of computational resources to solve. In order to effectively meet the needs of demanding computational problems SHARCNET participates in a national resource allocation competition on behalf of Compute Canada.

This competition allows researchers (and their groups) to obtain high-priority job queues on SHARCNET systems so that their jobs start more quickly than regular jobs. It also provides access to storage resources beyond the default limitations (both duration and size).

In the past SHARCNET used to run an internal resource allocation competition (both for longer term (dr) and shorter term (csdr) allocations, however, this is now administered by Compute Canada through the NRAC:

Compute Canada NRAC (National Resource Allocation Competition)

The Compute Canada NRAC allocation process accepts applications for substantial dedicated resource allocations every September, as well as other specialized allocation rounds throughout the year.

If you have a Compute Canada account please log in to the Compute Canada website and visit your resource applications page for further details.

If you would like to apply for a RAC application outside of the formal round you must email to make a request. It is strongly recommended that you contact us at first so that we can ensure you are prepared to make a reasonable request.

Compute Canada Direct Links

o General Compute Canadad FAQ

o Research Portal Rapid Access Service (RAS)

o 2017 Q&A October 2016 Presentation (RAC vs RAS)

o Resource Allocation Competitions (RAC) hosted by Compute Canada (RPP + RRG)

o Research Platforms and Portals (RPP) Competition

o Resources for Research Groups (RRG) Competition

Running Jobs

Before you start, make sure you have the project group name ready. It is of one of the forms rrg-username, rrg-username-xy or rpp-username, rpp-username-xy, where username is the PI's username and xy is a two letter string. This should be included in the recent message sent by RAC regarding your allocations. The following are a number of highlighted points that will help you to get started with using the dedicated resources:

1. Using Dedicated Queues To submit a job to the dedicated queue rrg-username-xy, for example, use the following command

   sqsub -q rrg-username-xy -r runtime -o output ./yourprog [ yourprog_arg_list ]

where runtime is estimated runtime in the format of [m|h|d], e.g. 50m for 50 minutes, 3.5h for 3.5 hours and 4d for 4 days, etc. Your job will be terminated if the run exceeds the run time estimate. The maximum runtime for each job is seven days. The option -o output specifies the name of the file output that you would like to have all output from standard out to be redirected to. For the details of sqsub options, use command

   sqsub -h

2. Running Parallel Jobs If you are running parallel jobs, i.e. MPI jobs, use the following command

   sqsub -q rrg-username-xy -f mpi -n num_cores -r runtime -o output ./yourprog [ yourprog_arg_list ]

Likewise, if you are running threaded jobs, then use the following command

   sqsub -q rrg-username-xy -f threaded -n num_cores -r runtime -o output ./yourprog [ yourprog_arg_list ]

Note the option -f mpi for MPI jobs or -f threaded jobs MUST be used in conjunction with the dedicated queue.

If you are going to run jobs utilizing GPUs on systems that are equipped with GPUs, you would need to add the option --gpp=num_gpus_per_process, e.g. if you are using one GPU per process, then use command

   sqsub -q rrg-username-xy -r runtime -o output --gpp=1 ./yourprog [ yourprog_arg_list ]

3. Run Time Limit and Checkpointing Jobs As of January 2013, all dedicated resource allocation project jobs are also subject to the limit of seven day maximum run time on all SHARCNET systems. To prevent unexpected system outage, you should enable check-point in your applications, so your computational time won't be wasted in the event of a system outage.

4. Queue Priority The dedicated queue has a higher priority than regular queues on the designated system, therefore, your jobs submitted to the dedicated queue have a better chance to start faster. It should not be expected, however, that dedicated jobs will be started immediately. It is common that dedicated jobs also have to wait for some time before they get started.

5. Access Privilege Access to the dedicate queue and storage is restricted to the users listed in this project only. It shall not be confused by the access privilege of the user group sponsored by the PI. Access to the dedicated queue and storage is by project, not the user group. For example, a user sponsored by PI but not listed as a project member will NOT have the access to queue and the storage.

6. Fairshare Your usage of dedicated resources will not affect your group fair share for using regular queues. It is important to remember always using the dedicated queue to run your dedicated resource allocation jobs. The progress, usage, and job submission history (in graphs) of your dedicated resource project is monitored and updated hourly.

Dedicated Queue and Storage Terms

The access to the dedicated queue(s) and storage is subject to the terms of awarded allocations. Under normal circumstances, the term for a RAC allocation is normally a year and unused allocations can not be carried over. After an allocation expires, the queue and storage will be closed. It is advised that you keep track of your usage and progress to maximize your efforts in completing your allocations.

For each National Resource Allocation Project (NRAP), we monitor the usage, job submission history, disk usage and files and provide graphs in the project log (See next section) for quick access. The following are sample graphs


Figure 1: Progress of the project over time.

Note in the progress graph, the green curve is the actual completion rate of your allocation up to date. The black line (diagonal) represents a continuous, linear progress. The blue line represents the progress of using 80% of the allocated resources. The interception of the red line and the horizontal axis indicates the last chance you must start running your jobs and the red line represents the maximum rate at which you must run your jobs continuously. It is expected therefore that the green curve must remain on the left of the red line or your allocation will not be able to complete by the end of the term.


Figure 2: Job submission/execution history.

du.png fc.png
Figure 3: Usage of the disk storage. Figure 4: Number of files in the storage. 'Warning: When the number of files exceeds 1 million, your jobs will be put on hold and won't be able to start until you reduce the number of files.

Each NRAP group has the access to a project log. There is a link in the web portal under "My Account" -> "Projects". This is where you get notice on your project and system updates. Each project member may post messages to the project log to communicate with us and among your research group members.

Using Project Log

Groups can see the projects associated with their NRAC or CSDR allocations in the web portal: My Account -> Projects (here). All communication with SHARCNET concerning your allocation should be conducted through this interface when possible. It also includes information on how well one is progressing towards the completion of their allocation and other supporting information.