The new national general purpose systems Graham and Cedar have massive compute resources that are available to all Canadian research teams. Canadian research teams have massive workloads to accomplish on these national general purpose systems. The scheduler is the part of the system that accepts resource requests for the execution of computation procedures (jobs) and dispatches the procedures to the compute resources. On national general purpose systems like these there are typically requests for more resources in the job queue than there are available resources at any given time on the cluster. Because of this the scheduler needs to make decisions about the priority of each queued job in order to determine its eligibility to be dispatched to available resources compared to other jobs in the queue.
This General Interest webinar describes details regarding the configuration of the national systems’ Slurm scheduler that determines how jobs are dispatched to resources. Understanding the scheduler at this level provides several benefits to researchers including: better prediction of queue wait times; potential wait time reductions due to informed job resource requests; a clear definition of the difference between general purpose Rapid Access Service (RAS) and Resource for Research Groups (RRG) allocations.
Beyond describing the scheduler configuration as it relates to job dispatching this webinar will also provide recommended best practices when submitting jobs and demonstrate tools for monitoring jobs and the job queue on Graham and Cedar.