SIGN-IN

SHARCNET Storage Structure And Policies

Version 2.0
2015-7-10

Overview

All SHARCNET systems are designed to present as uniform look and feel to users including storage layout and use as possible.

Since our primary resources are large compute clusters, storage must be organized to facilitate effective use of these resources. In particular, since the systems can process or generate very large amounts of data, storage must optimized to ensure that adequate short-term storage is always available so that cpu cycles are never wasted and that a storage hierarchy can handle the need for longer-term and archival storage.

The proposed storage structure recognizes the need to store different types of data for different time periods and with different levels of data integrity and performance. The structure is intended to develop flexible, cost effective and sophisticated uses of SHARCNET storage infrastructure by the user community.

Organization of User Spaces

Several specific user storage areas are in place. Each have different policies to encourage their proper use.

/home

intended for source code, parameter files etc. - data that is modest in size and may be subject to frequent modification. This data has an indefinite lifetime and may be considered secure. There is a firm quota of 10GB per account. This limit permits effective backups and recognizes the necessity, in large-scale HPC environments, of segregating large data and log files from source code etc. Each user has a single common /home across all systems. It is backed up and a sequence of increments is kept to allow for recovery of past data over some period. If you need recovery of data on /home, contact help@sharcnet.ca.

When the full amount of storage available in the /home directory for a user, the hard quota will prevent additional data from being written there. While this will not stop the user from submitting and running new jobs, or prevent existing jobs from running, any output directed towards the /home directory will be lost.


/work

intended for longer-term storage of large data files. A globally available /work directory with an initial quota of 1TB.

Exceeding storage quotas on the /work filesystems will result in the following measures:

  1. Initially, the user account is marked as over quota in the "quota" command on the clusters, and given a 3-day grace period in which to reduce their usage.
  2. If the user has any jobs currently running when they are noted by the system as over quota, a warning email will be sent to the user regarding their usage.
  3. If the user is over their quota on a sunday night, they will be sent an automated warning email as well.
  4. At the end of the grace period, if the usage of the /work directory is still in excess of the user's quota, the user will be placed into the 'over quota' group on the clusters, and will not be able to run new jobs until their account has been cleared of over quota status. Jobs can still be submitted but they will remain queued and not being execution.
  5. Quota scanning runs are started every 24 hours (typically at midnight) on each cluster, and as the scan completes, user quota status is updated. Some clusters occasionally take longer than others to scan due to filesystem problems, or heavy usage. If your account is flagged as over quota, and you have cleared the extra space, you can ask to have your over quota status cleared ahead of the next scan by sending an email to help@sharcnet.ca.

/scratch

intended as time-limited, short- term storage that is optimized to facilitate effective use of the compute resources. There is no space limit but files will be removed after 62 days. An email is sent to users before the data is destroyed. Most clusters have a single /scratch mounted by all nodes. On most clusters scratch this the highest performing file system and should be used as the primary source for data I/O as jobs execute.


/tmp

node-local storage is available under /tmp. This storage can be used for checkpointing or data output prior to (perhaps asynchronous) assembly and output to /scratch or /work. SHARCNET will encourage use of /tmp for applications which require or could benefit from use of node-local storage. Data can be stored here for short periods to service running or recently checkpointed or completed jobs.


swap

adequate swap is available (perhaps 2x memory as appropriate) on some nodes to enable suspension of jobs by the scheduler.


/freezer

intended for longer term storage of data that is not being actively used, but will still be needed in the future. Freezer storage is mounted as /freezer on all Sharcnet cluster login nodes, but is not accessible from the compute nodes. All users have a 2TB quota and files will be expired after 2 years. Freezer is not intended as an archive solution. It is for cold storage of data and results while publications are finalized. Going over quota on freezer is similar to going over on work. It places a user into the no-run group until the over quota condition is cleared.

It is possible to request a modest increased quota on any file system by sending email to help@sharcnet.ca. When requesting a quota increase, please include information regarding how long your expected need will last, and some information about the type of work you will be using the increased storage for.

Users/groups requiring large storage for several months may apply to the Compute Canada Resource Allocation Committee. Users/groups requiring exceptionally large storage for very long periods of time are encouraged to purchase additional private storage to be housed and managed within the SHARCNET infrastructure.

Users should be aware that reasonable steps are taken to ensure data integrity (all file systems are RAID 5, RAID 6 or similar except /tmp which is unprotected) but that SHARCNET cannot guarantee data integrity or security over any time period. In particular, users should note that parallel file systems, including the Lustre-based system used by SHARCNET for /scratch and /work, are inherently more complicated than standard disk arrays and should be used with this knowledge in mind. Please send any comments to sharcnet-policies@sharcnet.ca