From Documentation
Jump to: navigation, search
(Handling of allocations at SHARCNET)
(Handling of Allocations at SHARCNET)
Line 18: Line 18:
 
==Handling of Allocations at SHARCNET==
 
==Handling of Allocations at SHARCNET==
  
===Project Membership===
+
===Project Group and Membership===
NRAC and CSDR allocations, also conventionally termed NRAC projects (NRAPs) or SDR projects at SHARCNET, are handled on a per-group basis.  Unlike the UNIX group of users sponsored by the PI, this is an ''ad hoc'' group formed only for the dedicated resource allocations. This group includes all users including those collaborators who are sponsored by other PI(s) as well. This extended group defines a new UNIX group for each project, with the name, e.g. '''<tt>nrap</tt><i>nnn</i>''' or '''<tt>sdr</tt><i>nnn</i>''', pertinent to the dedicated resource allocations NRAC ''nnn'' or SDR ''nnn'', for exclusive access to the dedicated queue and storage.
+
NRAC and CSDR allocations, also conventionally termed NRAC projects (NRAPs) or SDR projects at SHARCNET, are handled on a per-group basis.  Unlike the existing UNIX group of users sponsored by the PI, this is an ''ad hoc'', additional group formed only for the dedicated resource allocations. This group includes all users including those sponsored by the PI and those who are sponsored by other PI(s) but participating in the project as well. This extended group defines a new UNIX group for each research project, with the name '''<tt>nrap</tt><i>nnn</i>''' for NRAC allocations and '''<tt>sdr</tt><i>nnn</i>''' for SDR allocations, with the number ''nnn'' referencing to the application ID, pertinent to the specific dedicated resource allocation, for exclusive access to the dedicated queue and storage.
  
By default, users sponsored by the same PI shall have the group membership for this dedicated resource allocation project group. The PI may add or remove the membership of a user through the Compute Canada web portal or simply ask SHARCNET to to do so.
+
By default, this extended group includes users sponsored by the group PI. The PI may add or remove the membership of a user to limit the access to the dedicated resources through the Compute Canada web portal or simply ask SHARCNET to do so.
  
 
===Dedicated Queues===
 
===Dedicated Queues===
In addition to this new unix group, special queues corresponding to the project are set up on the designated systems which grant members the ability to submit high-priority jobs will start before regular jobs.  The queue name will be of the following form
+
For each allocation, a special, high priority queue is created on the designated system for exclusive access by the project group. It grants the project group members the ability to submit high-priority jobs will start before regular jobs.  The queue name will be of the following form
  
NRAP_''nnn'' for NRAC allocation ''nnn'',
+
* NRAP_''nnn'' for NRAC allocation ''nnn'',
  
SDR_''nnn'' or DR_''nnn'' for small DR or DR allocations.
+
* SDR_''nnn'' or DR_''nnn'' for small DR or DR allocations
  
The size of the queue (both number of jobs and number of concurrent cpu cores) is specified at the outset of the project based on it's needs.
+
where ''nnn'' refers to the corresponding application ID.
 +
 
 +
The size of the queue (both number of jobs and number of concurrent CPU cores or number of GPUs) is specified at the outset of the project based on it's needs.
  
 
===Dedicated Storage===
 
===Dedicated Storage===
Line 37: Line 39:
 
Each group will have access to  
 
Each group will have access to  
  
<tt>/work/</tt><i>group</i> and  
+
* <tt>/work/</tt><i>group</i> on work and  
 +
 
 +
* <tt>/scratch/</tt><i>group</i> on scratch
 +
 
 +
where ''group'', which refers to the ad hoc project group, has the name of
 +
 
 +
* <tt>nrap</tt><i>nnn</i> for NRAC allocation ''nnn''
  
<tt>/scratch/</tt><i>group</i>  
+
* <tt>sdr</tt><i>nnn</i> or <tt>dr</tt><i>nnn</i> for small DR or DR allocations with application ID ''nnn''.
  
on global work and system scratch respectively. The size of the storage of these directories is set to the dedicated storage awarded or the default.
+
The size of the storage of these directories is set to the dedicated storage awarded or the default.
  
 
===Running Jobs===
 
===Running Jobs===
 
The following are a number of highlighted points that will help you to get started with using the dedicated resources:
 
The following are a number of highlighted points that will help you to get started with using the dedicated resources:
  
1. USING DEDICATED QUEUE
+
'''1. Using Dedicated Queues'''
To submit a job, use the following command
+
To submit a job to the dedicated queue NRAP_123, for example, use the following command
  
     sqsub -q dedicated_queue -r runtime -o output ./yourprog [ yourprog_arg_list ]
+
     sqsub -q NRAP_123 -r runtime -o output ./yourprog [ yourprog_arg_list ]
  
 
where runtime is estimated runtime in the format of [m|h|d], e.g. 50m for 50 minutes, 3.5h for 3.5 hours and 4d for 4 days, etc. Your job will be terminated if the run exceeds the run time estimate. The maximum runtime for each job is seven days. The option -o output specifies the name of the file output that you would like to have all output from standard out to be redirected to. For the details of sqsub options, use command
 
where runtime is estimated runtime in the format of [m|h|d], e.g. 50m for 50 minutes, 3.5h for 3.5 hours and 4d for 4 days, etc. Your job will be terminated if the run exceeds the run time estimate. The maximum runtime for each job is seven days. The option -o output specifies the name of the file output that you would like to have all output from standard out to be redirected to. For the details of sqsub options, use command
Line 55: Line 63:
 
     sqsub -h
 
     sqsub -h
  
2. RUNNING PARALLEL JOBS
+
'''2. Running Parallel Jobs'''
 
If you are running parallel jobs, i.e. MPI or threaded jobs, use the following command
 
If you are running parallel jobs, i.e. MPI or threaded jobs, use the following command
  
Line 62: Line 70:
 
Note the option -f mpi for MPI jobs or -f threaded jobs MUST be used in conjunction with the dedicated queue.
 
Note the option -f mpi for MPI jobs or -f threaded jobs MUST be used in conjunction with the dedicated queue.
  
3. CHECKPOINT
+
'''3. Run Time Limit and Checkpointing Jobs'''
To prevent unexpected system outage, you should enable _check-point_ in your applications, so your computational time won't be wasted in the event of a system outage.
+
As of January 2013, all dedicated resource allocation project jobs are also subject to the limit of seven day maximum run time on all SHARCNET systems. To prevent unexpected system outage, you should enable '''check-point''' in your applications, so your computational time won't be wasted in the event of a system outage.
  
4. QUEUE PRIORITY
+
'''4. Queue Priority'''
 
The dedicated queue has a higher priority than regular queues on the designated system, therefore, your jobs submitted to the dedicated queue have a better chance to start faster. It should not be expected, however, that dedicated jobs will be started immediately. It is common that dedicated jobs also have to wait for some time before they get started.
 
The dedicated queue has a higher priority than regular queues on the designated system, therefore, your jobs submitted to the dedicated queue have a better chance to start faster. It should not be expected, however, that dedicated jobs will be started immediately. It is common that dedicated jobs also have to wait for some time before they get started.
  
5. ACCESS PRIVILEGE
+
'''5. Access Privilege'''
 
Access to the dedicate queue and storage is restricted to the users listed in this project only. It shall not be confused by the access privilege of the user group sponsored by the PI. Access to the dedicated queue and storage is by project, not the user group. For example, a user sponsored by PI but not listed as a project member will NOT have the access to queue and the storage.
 
Access to the dedicate queue and storage is restricted to the users listed in this project only. It shall not be confused by the access privilege of the user group sponsored by the PI. Access to the dedicated queue and storage is by project, not the user group. For example, a user sponsored by PI but not listed as a project member will NOT have the access to queue and the storage.
  
6. FAIRSHARE
+
'''6. Fairshare'''
 
Your usage of dedicated resources will not affect your group fair share for using regular queues. It is important to remember always using the dedicated queue to run your dedicated resource allocation jobs. The progress, usage, and job submission history (in graphs) of your dedicated resource project is monitored and updated hourly.
 
Your usage of dedicated resources will not affect your group fair share for using regular queues. It is important to remember always using the dedicated queue to run your dedicated resource allocation jobs. The progress, usage, and job submission history (in graphs) of your dedicated resource project is monitored and updated hourly.
  
 
===Project Log===
 
===Project Log===
Groups can see the projects associated with their NRAC or CSDR allocations in the web portal [https://www.sharcnet.ca/my/profile/projects here].  All communication with SHARCNET concerning your allocation should be conducted through this interface when possible.  It also includes information on how well one is progressing towards the completion of their allocation and other supporting information.
+
Groups can see the projects associated with their NRAC or CSDR allocations in the web portal: My Account -> Projects ([https://www.sharcnet.ca/my/profile/projects here]).  All communication with SHARCNET concerning your allocation should be conducted through this interface when possible.  It also includes information on how well one is progressing towards the completion of their allocation and other supporting information.

Revision as of 18:19, 21 January 2013

Different problems require differing degrees of computational resources to solve. In order to effectively meet the needs of demanding computational problems SHARCNET participates in a national resource allocation competition on behalf of Compute Canada, as well as operating a continuous competition for short-duration resource allocations throughout the year.

These competitions allow researchers (and their groups) to obtain high-priority job queues on SHARCNET systems so that their jobs start more quickly than regular jobs. It also provides access to storage resources beyond the default limitations (both duration and size).

Compute Canada NRAC (National Resource Allocation Competition)

The Compute Canada NRAC allocation process accepts applications for substantial dedicated resource allocations every September, as well as other specialized allocation rounds throughout the year.

If you have a Compute Canada account please log in to the Compute Canada website and visit your resource applications page for further details.

SHARCNET Continuous Small Dedicated Resources

The SHARCNET Continuous Small Dedicated Resources process accepts applications for substantial resource allocations throughout the year. The process is initiated by contacting a SHARCNET staff member to discuss the computational problem at hand and the suitability of it for the program.

For more information, please refer to the dedicated resources application guidelines for the current (2012) round.

Handling of Allocations at SHARCNET

Project Group and Membership

NRAC and CSDR allocations, also conventionally termed NRAC projects (NRAPs) or SDR projects at SHARCNET, are handled on a per-group basis. Unlike the existing UNIX group of users sponsored by the PI, this is an ad hoc, additional group formed only for the dedicated resource allocations. This group includes all users including those sponsored by the PI and those who are sponsored by other PI(s) but participating in the project as well. This extended group defines a new UNIX group for each research project, with the name nrapnnn for NRAC allocations and sdrnnn for SDR allocations, with the number nnn referencing to the application ID, pertinent to the specific dedicated resource allocation, for exclusive access to the dedicated queue and storage.

By default, this extended group includes users sponsored by the group PI. The PI may add or remove the membership of a user to limit the access to the dedicated resources through the Compute Canada web portal or simply ask SHARCNET to do so.

Dedicated Queues

For each allocation, a special, high priority queue is created on the designated system for exclusive access by the project group. It grants the project group members the ability to submit high-priority jobs will start before regular jobs. The queue name will be of the following form

  • NRAP_nnn for NRAC allocation nnn,
  • SDR_nnn or DR_nnn for small DR or DR allocations

where nnn refers to the corresponding application ID.

The size of the queue (both number of jobs and number of concurrent CPU cores or number of GPUs) is specified at the outset of the project based on it's needs.

Dedicated Storage

If a group applied for a storage allocation they will typically be given a new directory for the project on the required filesystem, with any storage quotas and expiry timescales set at the outset of the project based on it's needs.

Each group will have access to

  • /work/group on work and
  • /scratch/group on scratch

where group, which refers to the ad hoc project group, has the name of

  • nrapnnn for NRAC allocation nnn
  • sdrnnn or drnnn for small DR or DR allocations with application ID nnn.

The size of the storage of these directories is set to the dedicated storage awarded or the default.

Running Jobs

The following are a number of highlighted points that will help you to get started with using the dedicated resources:

1. Using Dedicated Queues To submit a job to the dedicated queue NRAP_123, for example, use the following command

   sqsub -q NRAP_123 -r runtime -o output ./yourprog [ yourprog_arg_list ]

where runtime is estimated runtime in the format of [m|h|d], e.g. 50m for 50 minutes, 3.5h for 3.5 hours and 4d for 4 days, etc. Your job will be terminated if the run exceeds the run time estimate. The maximum runtime for each job is seven days. The option -o output specifies the name of the file output that you would like to have all output from standard out to be redirected to. For the details of sqsub options, use command

   sqsub -h

2. Running Parallel Jobs If you are running parallel jobs, i.e. MPI or threaded jobs, use the following command

   sqsub -q dedicated_queue -f mpi|threaded -n num_cores -r runtime -o output ./yourprog [ yourprog_arg_list ]

Note the option -f mpi for MPI jobs or -f threaded jobs MUST be used in conjunction with the dedicated queue.

3. Run Time Limit and Checkpointing Jobs As of January 2013, all dedicated resource allocation project jobs are also subject to the limit of seven day maximum run time on all SHARCNET systems. To prevent unexpected system outage, you should enable check-point in your applications, so your computational time won't be wasted in the event of a system outage.

4. Queue Priority The dedicated queue has a higher priority than regular queues on the designated system, therefore, your jobs submitted to the dedicated queue have a better chance to start faster. It should not be expected, however, that dedicated jobs will be started immediately. It is common that dedicated jobs also have to wait for some time before they get started.

5. Access Privilege Access to the dedicate queue and storage is restricted to the users listed in this project only. It shall not be confused by the access privilege of the user group sponsored by the PI. Access to the dedicated queue and storage is by project, not the user group. For example, a user sponsored by PI but not listed as a project member will NOT have the access to queue and the storage.

6. Fairshare Your usage of dedicated resources will not affect your group fair share for using regular queues. It is important to remember always using the dedicated queue to run your dedicated resource allocation jobs. The progress, usage, and job submission history (in graphs) of your dedicated resource project is monitored and updated hourly.

Project Log

Groups can see the projects associated with their NRAC or CSDR allocations in the web portal: My Account -> Projects (here). All communication with SHARCNET concerning your allocation should be conducted through this interface when possible. It also includes information on how well one is progressing towards the completion of their allocation and other supporting information.