From Documentation
Jump to: navigation, search
(moved R content here)
(move contents of "Using storage" here)
Line 1: Line 1:
 
Legacy systems are older systems which are about to be retired, or given a new set of software to make them similar to orca and graham.  This page collects all information applicable to these systems.
 
Legacy systems are older systems which are about to be retired, or given a new set of software to make them similar to orca and graham.  This page collects all information applicable to these systems.
  
==Systems==
+
=Systems=
  
  
Line 16: Line 16:
 
The information on this page '''does not apply''' to [[graham]].
 
The information on this page '''does not apply''' to [[graham]].
  
==Running jobs==
+
=Running jobs=
  
 
=== What is the batch job scheduling environment SQ? ===
 
=== What is the batch job scheduling environment SQ? ===
Line 746: Line 746:
 
If you encounter problems please email [mailto:help@sharcnet.ca help@sharcnet.ca].
 
If you encounter problems please email [mailto:help@sharcnet.ca help@sharcnet.ca].
  
== Using R on SHARCNET ==
+
= Using R on SHARCNET =
  
 
===Version Selection===
 
===Version Selection===
Line 1,252: Line 1,252:
 
  make  
 
  make  
 
  make install.  That's it!
 
  make install.  That's it!
 +
 +
=Storage=
 +
== What storage is available ==
 +
 +
The Sharcnet clusters have a variety of storage systems available to users.  Which one you place your files into depends on your specific needs.  In all cases, user directories are stored as (filesystem)/(userid), where the user "sharcnet" would find their home directory as /home/sharcnet, their work directory as /work/sharcnet, and scratch as /scratch/sharcnet.  The only exceptions to this are on the kraken login nodes, where sub-cluster specific scratch directories are stored as /scratch/(subcluster)/(user), thus placing the sharcnet user's scratch directory for whale nodes on /scratch/wha/sharcnet while on kraken login nodes.  The kraken compute nodes follow the standard /scratch/(userid) pattern.
 +
 +
Below is a list of the filesystems that are available on the Sharcnet clusters:
 +
 +
# /home
 +
#* Space: 10 GB
 +
#* Purpose: Storage of configuration files, and small source code trees 
 +
#* Available on:  all login and compute nodes, and Visualization machines
 +
#* Quota type: Hard limit - once exceeded, no more files can be written 
 +
# /scratch
 +
#* Space: Variable depending on cluster
 +
#* Purpose: Temporary storage for fast access to files for data processing. /scratch storage should absolutely not be used as a long term storage location.
 +
#* Available on: all login and compute nodes of an individual cluster have access to that cluster's scratch filesystem.  Kraken nodes use the scratch for their local sub-cluster, and all kraken sub-cluster scratches are available on the login nodes.  Visualization machines have individual local scratch filesystems.
 +
#* Quota type:  Timed expiry - files unchanged for 62 days are automatically removed.
 +
# /work
 +
#* Space: Global work (on most clusters) has 1TB, local work (on clusters mako and requin) has 200GB quota
 +
#* Purpose: Long term storage of source code, program code, and data that is being actively used
 +
#* Available on: all login and compute nodes - mako cluster has access only to it's own /work directories, requin uses local /oldwork, and mounts global work as /gwork, and all other clusters and Visualization machines mount global work as /work
 +
#* Quota type:  Soft limit - once exceeded, limits on cluster resources are enforced until usage is below limits again
 +
# /freezer
 +
#* Space: 2 TB
 +
#* Purpose: Long term storage of data that is not currently being used, but may be needed later
 +
#* Available on: All cluster login nodes
 +
#* Quota type: 2 years expiry
 +
# /tmp
 +
#* Space: Small, varies by cluster and node
 +
#* Purpose:  Very short term, local data storage during calculations.  /tmp files can not be relied on to remain past the end of a job's run.
 +
#* Available on: node-local storage, each node has an independent /tmp drive which is not accessible across the cluster, or on login nodes.
 +
#* Quota type:  Periodic purging of /tmp drive between running jobs.
 +
 +
== Quota and how it works ==
 +
 +
Space usage on the home and work filesystems is monitored through a quota system.  To see your current usage according to this system, you can use the ''quota'' command when logged into a cluster or visualization machine, like this:
 +
 +
[sharcnet@req769:~] quota
 +
Filesystem          Limit      Used            File Count  Checked
 +
jones:/home          10 GB      *11.3 GB (112%)        1,986    12h ago
 +
lundun:/work        1 TB        20.9 MB (0%)        323,313  10h ago
 +
 +
The meanings of the sections of the output are as follows:
 +
 +
'''Filesystem''':  Indicates the cluster, and directory on which the user's data in question is stored.  The special clusters 'lundun' and 'gulf' represent the global work directories, which are accessible across all of Sharcnet's clusters and visualization machines.
 +
 +
'''Limit''':  Indicates the maximum amount of storage space you are allowed to access on the filesystem in question.
 +
 +
'''Used''':  Indicates the amount of space currently occupied by your files on the indicated filesystem.  Any entry which is over the limit will be marked with a * - in the displayed example above, the sharcnet user is over their /home quota limit.
 +
 +
'''File Count''':  Indicates the total number of files contained in your directory on the filesystem in question.
 +
 +
'''Checked''': Indicates how long ago the most recent complete usage check was finished on the indicated filesystem.  Quota scans are typically run every 24 hours, starting just after midnight, and depending on which filesystem and cluster can take anywhere from 5 minutes to several hours to complete.
 +
 +
Additionally, if your account has had resource limitations applied to it due to being over quota on a /work filesystem for too long, a warning about this will be displayed before the regular output of the quota command.
 +
 +
As monitoring your quota is a good idea, you may want to add a ''quota'' line to your .bashrc file to display your current usage every time you log into a cluster, so that you will become aware of any overages as soon as you log in.
 +
 +
Last, if you are the owner of a Dedicated Resources project, your DR project directory will also be listed in the output from the quota command, and will be labelled as such, like this:
 +
 +
[sharcnet@req769:~] quota
 +
Filesystem              Limit      Used            File Count  Checked
 +
jones:/home              10 GB      *11.3 GB (112%)      1,986    12h ago
 +
lundun:/work            1 TB        20.9 MB (0%)      323,313  12h ago
 +
gulf:/work/nrap12345    15 TB      12.2 TB (81%)    1,343,636  13h ago
 +
 +
 +
== Home Quota ==
 +
=== Effects of Overage ===
 +
 +
Typical /home quota is 10 GB per user and is enforced as a hard limit.  This means that when your usage exceeds the allowed space, you will not be able to write additional files to your home directory, and will receive write errors if you attempt to do so.  No other restrictions are placed on accounts which have exceeded their quota on the /home filesystem, however as the default job submission will place log entries and some output from your job into your home directory, this data from your jobs may become unavailable if you were over your home quota when the job completes.
 +
 +
=== Fixing Overage ===
 +
 +
To correct overages in your home directory, it is necessary to log into a Sharcnet cluster, and remove files from your home directory, either through deletion, or moving the files to the /work or /freezer filesystems.  Once your usage is below the limits, you will be able to make use of the directory again immediately, regardless of the output from the quota command, which will only update once per day.
 +
 +
To verify immediately the amount of space in your directory, you can use this command:
 +
 +
[sharcnet@bul129:~] du -sh /home/sharcnet
 +
124M    /home/sharcnet
 +
 +
One thing to note - the ''du'' command can sometimes take some time to run on the filesystem, so you may need to be patient with it.
 +
 +
== Work Quota ==
 +
=== Effects of Overage ===
 +
 +
Global /work directories have a quota of 1TB, and cluster local /work filesystems have quotas of 200GB, enforced with a soft limit, meaning that while you can still write files to your directory while over your quota, other resource limitations are placed on your account when you are found to be over quota.
 +
 +
Temporarily exceeding your /work quota at the end of a job and removing the excess files immediately will have no effect on your account.  Quota scans begin just after midnight, and as long as your usage is back below your quota by then, the system will not even notice that there was an issue.
 +
 +
If your /work usage is still over quota when the nightly filesystem scan runs, your account will be flagged as over quota, and placed into a 3-day grace period before resource limits are applied.  This will cause your usage of that filesystem in the 'quota' command's output to be marked with a *, but otherwise will not place any other limits on your account, or your ability to submit and run jobs, to allow you time to clean up your excess usage without penalty.
 +
 +
If your /work usage is over quota for more than three days, your account will be flagged as over quota, and resource limitations will be applied which prevent you from submitting new jobs to the clusters, and will prevent any already queued jobs from running.  You can verify if your account has had resource limits placed on it by using the ''groups'' command while logged into a cluster, and checking of the group 'ovrquota' is in the resulting list, like this:
 +
 +
[sharcnet@gb2:~] groups
 +
sharcnet certlvl1 guelph_users fairshare-1 '''ovrquota'''
 +
 +
If you have jobs currently running or queued when your account is noted as being over quota, a warning email will be sent to your Sharcnet account informing you of the overage. Additionally, if your account is over quota at the end of the week (Sunday night), a warning email will be sent to your Sharcnet account even if no jobs are currently running.
 +
 +
=== Fixing Overage ===
 +
 +
If your account has had resource limitations placed on it due to being over quota for more than three days, there are two steps to correcting the problem:
 +
 +
First, you need to remove sufficient files from your directory to get your actual usage below your quota - this can be done by just deleting the files, copying them to your own local storage and deleting them, or by moving files to /freezer for long term storage.  You can verify that your usage is below the limit by using the ''du'' command, like this:
 +
 +
[sharcnet@saw-login1:~] du -sh /work/sharcnet
 +
843G    /work/sharcnet
 +
 +
As with the /home filesystem, the ''du'' command can sometimes take a considerable amount of time to run, especially if you have a large number of files and directories.
 +
 +
The second step in getting the over quota resource limitations removed from your account is to wait for the next quota scan to complete, at which time your overquota status will be cleared, and the 'ovrquota' group will be removed from your account.
 +
 +
If you have left any login sessions running from when you were over quota, you may need to log out of those sessions, and log back in in order to completely clear the ovrquota status.  If this is the case, then the 'quota' command will report you as not being over on your usage, but the 'groups' command will still show you belonging to the 'ovrquota' group.
 +
 +
== Scratch Expiration ==
 +
 +
The scratch filesystems are intended for short term storage of data files to provide local filesystem access for each cluster, which allows faster file access than the global /work filesystem.  Because there is no space limitation, the scratch filesystems work on an age-based expiration system, where files that have not been altered for more than the expiration period are removed as "stale".  The current expiration time for scratch filesystems is 62 days.
 +
 +
Because of the expiration, it is important to remember that you should not use /scratch for long term storage, or storage of important output files - any files produced by jobs that you need for long term should be moved out of /scratch into your /work or /freezer directory immediately after your job has completed.
 +
 +
Additionally, due to the lack of space usage limitations on the /scratch filesystem, it is possible for a few users to occupy the entire scratch filesystem, causing all of the users on that cluster to be unable to write to the drive.  To check that space is available, you can use this command:
 +
 +
[sharcnet@orc-login1:~] df -h /scratch/
 +
Filesystem            Size  Used Avail Use% Mounted on
 +
10.27.9.132@o2ib:10.27.9.133@o2ib:/orcalfs
 +
                      54T  47T  6.9T  88% /orc_lfs
 +
 +
In this case, the output indicates that the scratch filesystem on the Orca cluster has a total of 54TB, 47TB of which is currently in use, leaving only 6.9TB available for other users, with the filesystem being 88% full.  A good practice would involve removing your old data files after your job has completed, to ensure that the filesystem does not become excessively filled with unwanted data.
 +
 +
== Archive Storage ==
 +
 +
'''Note''': recently the /archive filesystem has been discontinued and replaced by /freezer
 +
 +
Long term archival storage of files on Sharcnet clusters is provided through the /freezer filesystem.  '''Please note:''' unlike our old /archive file system, the new /freezer file system has both a size quota (2TB; going over the quota results in your submitted jobs not running - same as with /work), and an expiry: after 2 years your files will be deleted. See our [https://www.sharcnet.ca/my/systems/storage storage policies page] for details. The /freezer filesystem is accessible only from the login nodes of the various clusters, and to use it, you simply move the files you wish to archive into your /freezer directory.  As an example, the "sharcnet" user, wishing to move an entire directory called My_Results from their /work directory to /freezer, would do so by logging into the cluster, and using these commands:
 +
 +
[sharcnet@gup-hn:~] cd /work/sharcnet
 +
[sharcnet@gup-hn:/work/sharcnet] mv My_Results /freezer/sharcnet/
 +
 +
Please note that if you are moving files into /freezer, you should either use the ''mv'' command to move the files, or delete the old files after copying them.
 +
 +
For moving substantial amounts of data the users should use the machine '''dtn.sharcnet.ca''' which is a dedicated data transfer node.
 +
 +
Additionally, for source code storage and backups, SHARCNET has set up a GIT repository, which has usage instructions here:  https://www.sharcnet.ca/help/index.php/Git
 +
 +
== Requesting an Exemption ==
 +
 +
Users who have need of a larger amount of storage space can submit a request for additional space to help@sharcnet.ca for an extension.  The request should include:
 +
 +
* Which filesystem the quota extension is requested to be on (global work, requin work, mako work, home directory, or scratch timeout period)
 +
* What the space will be used for
 +
* Why the space is needed, rather than simply placing most of the data in /freezer
 +
 +
== Advice for best practices ==
 +
 +
=== Watch your quota ===
 +
The first, and most important step to take in maintaining your disk usage on Sharcnet, is to keep a close eye on how much space you are using with the ''quota'' command.  The easiest way to do this is to add it to the end of the .bashrc file in your home directory, so that your quota status will be displayed to you every time you log into a Sharcnet cluster or visualization machine.
 +
 +
 +
=== Clean up unused files ===
 +
 +
Any data which you are not currently using for your work should be moved to /freezer storage - this can save you from headaches of running out of space in your work directory, and also keeps the scratch filesystems free of un-needed data, and available for everyone.
 +
 +
Any data you no longer need at all should be deleted.
 +
 +
 +
=== Small numbers of large files is better than large numbers of small files ===
 +
 +
Large numbers of files in a single directory will cause file access to be slow for that directory.  To prevent this, first you should try to make use of sub-directories to divide up your files into more manageable groups.  Generally, you can place around 1000 files in a directory before access to the directory starts to get slowed by the number of files.
 +
 +
For archiving, large collections of small files should be archived together with the ''tar'' command, like this:
 +
[sharcnet@mako2:/work/sharcnet] tar -c -v --remove-files -f /freezer/sharcnet/worlds-20090324.tar world*
 +
This would collect all of the files with names starting with "world" into a single archive file named "/freezer/sharcnet/worlds-20090324.tar", indicating that they were "world" files from March 24th, 2009.  The "/freezer/sharcnet/" part means that this file will be created in the /freezer filesystem, thus also removing it from occupying disk quota.
 +
 +
The parameters of the tar command have the following meanings:
 +
 +
"-c"  means "Create", this is used to create a new archive.
 +
 +
"-v"  means "Verbose", this causes ''tar'' to print out a list of all of the files it is adding to the archive, as it adds them.
 +
 +
"--remove-files"  causes ''tar'' to delete the old versions of the files after they are successfully added to the archive file.
 +
 +
"-f worlds-20090324.tar" causes ''tar'' to output the results to the '''f'''ile '''worlds-20090324.tar'''.  Without this, ''tar'' will simply display the contents of the file to the screen, instead of actually creating the archive file.
 +
 +
Optionally, the "-z" option can be added to compress your files as they are archived - if you do this, you should add '''.gz''' to the end of your archive file name.
 +
 +
To view a list of the files contained in an archive, you can use this command:
 +
[sharcnet@mako2:/work/sharcnet] tar -t -v -f /freezer/sharcnet/worlds-20090324.tar
 +
  -rw-r--r-- sharcnet/sharcnet 4529 2009-03-22 09:44:21 world-novirt0
 +
  -rw-r--r-- sharcnet/sharcnet 4515 2009-03-22 09:43:02 world-novirt1
 +
  -rw-r--r-- sharcnet/sharcnet 29850 2009-03-22 09:28:18 world-yesvirt0
 +
  ...
 +
 +
Using the "-t" parameter instead of "-c" tells ''tar'' to list the contents of the file, rather than creating a new file.  If you compressed your archive with "-z", you will also need to include this in the command to list it's contents.
 +
 +
 +
Lastly, to extract the files into your current directory, you would use:
 +
[sharcnet@mako2:/scratch/sharcnet] tar -x -v -m -f /freezer/sharcnet/worlds-20090324.tar
 +
 +
In this case, the two changed parameters are as follows:
 +
 +
"-x"  causes ''tar'' to extract the archived files into whatever directory you are currently in.  In the above example, we are extracting them into the /scratch/sharcnet directory on the mako cluster.
 +
 +
"-m"  is important to use when extracting archived files to /scratch, as it resets the creation time on those files to the current time, so that they will not accidentally be expired early while we are still using them.
 +
 +
Again, if your archive file was compressed with the "-z" option, you will need to include it in the command to extract the files.

Revision as of 11:19, 16 May 2019

Legacy systems are older systems which are about to be retired, or given a new set of software to make them similar to orca and graham. This page collects all information applicable to these systems.

Contents

Systems

Current legacy systems are:

All these systems should be accessed as "system.sharcnet.ca".

The information on this page does not apply to graham.

Running jobs

What is the batch job scheduling environment SQ?

SQ is a unified frontend for submitting jobs on SHARCNET, intended to hide unnecessary differences in how the clusters are configured. On clusters which are based on RMS, LSF+RMS, or Torque+Maui, SQ is just a thin shell of scripting over the native commands. On Wobbie, the native queuing system is called SQ.

To submit a job, you use sqsub:

sqsub -n 16 -q mpi -r 5h ./foo

This submits foo as an MPI command on 16 processors with a 5 hour runtime limit (make sure to be somewhat conservative with the runtime limit as a job may run for longer than expected due to interference from other jobs). You can control input, output and error output using these flags:

sqsub -o outfile -i infile -e errfile -r 5h ./foo

this will run foo with its input coming from a file named infile, its standard output going to a file named outfile, and its error output going to a file named errfile. Note that using these flags is preferred over shell redirection, since the flags permit your program to do IO directly to the file, rather than having the IO transported over sockets, then to a file.

For threaded applications (which use Pthreads, OpenMP, or fork-based parallelism), do this:

sqsub -q threaded -n 2 -r 5h -o outfile ./foo

For serial jobs

sqsub -r 5h -o outfile ./foo

How do I check running jobs and control jobs under SQ?

To show your jobs, use "sqjobs". by default, it will show you only your own jobs. with "-a" or "-u all", it will show all users. similarly, "-u someuser" will show jobs only for this particular user.

the "state" listed for a job is one of the following:

  • Q - queued
  • R - running
  • Z - suspended (sleeping)
  • D - done (shown briefly on some systems)
  •  ? - unknown (something is wrong, such as a node crashing)

times shown are the amount of time since being submitted (for queued jobs) or starting (for all others).

To kill, suspend or resume your jobs, use sqkill/suspend/resume with the job ID as shown by sqjobs.


Handling long jobs with chained job submission

Job dependencies can be handled similarly on the legacy MOAB scheduled systems via the -w flag to the sqsub command. Once you have ensured that your job can automatically resume from a checkpoint the best way conduct long simulations is to submit a chain of jobs, such that each subsequent job depends on the jobid before it. This will minimize the time your subsequent jobs will wait to run.

This can be done with the sqsub -w flag, eg.

    -w|--waitfor=jobid[,jobid...]]
                   wait for a list of jobs to complete

For example, consider the following instance where we want job #2 to start after job #1. We first submit job #1:

[snuser@bul131 ~]$ sqsub -r 10m -o chain.test hostname
WARNING: no memory requirement defined; assuming 1GB
submitted as jobid 5648719

Now when we submit job #2 we specify the jobid from the first job:

[snuser@bul131 ~]$ sqsub -r 10m -w 5648719 -o chain.test hostname
WARNING: no memory requirement defined; assuming 1GB
submitted as jobid 5648720

Now you can see that two jobs are queued, and one is in state "*Q" - meaning that it has conditions:

[snuser@bul131 ~]$ sqjobs
  jobid  queue state ncpus nodes time command
------- ------ ----- ----- ----- ---- -------
5648719 serial     Q     1     -  15s hostname
5648720 serial    *Q     1     -   2s hostname
2232 CPUs total, 1607 busy; 1559 jobs running; 1 suspended, 6762 queued.
403 nodes allocated; 154 drain/offline, 558 total.

Looking at the second job in detail we see that it will not start until the first job has completed with an "afterok" status:

[snuser@bul131 ~]$ qstat -f 5648720 | grep -i depend
    depend = afterok:5648719.krasched@krasched 
    -N hostname -l pvmem=1024m -m n -W depend=afterok:5648719 -l walltime=

In this fashion it is possible to string many jobs together. The second job (5648720) should continue to accrue priority in the queue while the first job is running, so once the first job completes the second job should start much more quickly than if it were submitted after the first job completed.

I get errors trying to redirect input into my program when submitted to the queues, but it runs fine if run interactively

The standard method to attach a file as the input to a program when submitting to SHARCNET queues is to use the -i flag to sqsub, e.g.:

sqsub -q serial -i inputfile.txt ...

Occasionally you will encounter a situation where this approach appears not to work, and your program fails to run successfully (reasons for which can be very subtle). Here is an example of one such message that was being generated by a FORTRAN program:

lib-4001 : UNRECOVERABLE library error 
    A READ operation tried to read past the end-of-file.

Encountered during a list-directed READ from unit 5 
Fortran unit 5 is connected to a sequential formatted text file 
    (standard input). 
/opt/sharcnet/sharcnet-lsf/bin/sn_job_starter.sh: line 75: 25730 Aborted (core dumped) "$@"

yet if run on the command line, using standard shell redirection, it works fine, e.g.:

program < inputfile.txt

Rather than struggle with this issue, there is an easy workaround: instead of submitting the program directly, submit a script that takes the name of the file for input redirection as an argument, and have that script launch your program making use of shell redirection. This circumvents whatever issue the scheduler is having by not having to do the redirection of the input via the submission command. The following shell script will do this (you can copy this directly into a text file and save it to disk; the name of the file is arbitrary but we'll assume it to be exe_wrapper.sh).

Bash Shell script: exe_wrapper.sh
#!/bin/bash
 
EXENAME=replace_with_name_of_real_executable_program
 
if (( $# != 1 )); then
        echo "ERROR: incorrect invocation of script"
        echo "usage: ./exe_wrapper.sh <input_file>"
        exit 1
fi
 
./${EXENAME} < ${1}

Note that you must edit the EXENAME variable to reference the name of the actual executable, and can be easily modified to take or provide additional arguments to the program being executed as desired. Ensure the script is executable by running chmod +x exe_wrapper.sh. You can now submit the job by submitting the *script*, with a single argument being the file to be used as input, i.e:

sqsub -q serial -r 5h -o outputfile.log ./exe_wrapper.sh intputfile.txt

This will result in the job being run on a compute node as if you had typed:

./program < inputfile.txt

NOTE: this workaround, as provided, will only work for serial programs, but can be modified to work with MPI jobs by further leveraging the --nompirun option to the scheduler, and launching the parallel job within the script using mpirun directly. This is explained below.

How do I submit a large number of jobs with a script?

There are two methods: you can pack a large number of runs into a single submitted job, or you can use a script to submit a large number of jobs to the scheduler.

With the first method, you would write a shell script (let us call it start.sh) similar to the one found above. On requin with the older HP-MPI it would be something like this:

#!/bin/csh
/opt/hpmpi/bin/mpirun -srun ./mpiRun1 inputFile1
/opt/hpmpi/bin/mpirun -srun ./mpiRun2 inputFile2
/opt/hpmpi/bin/mpirun -srun ./mpiRun3 inputFile3
echo Job finishes at `date`.
exit

On orca with OpenMPI the script would be (note that the number of processors should match whatever you specify with sqsub):

#!/bin/bash
/opt/sharcnet/openmpi/1.6.2/intel/bin/mpirun -np 4 --machinefile $PBS_NODEFILE ./mpiRun1
/opt/sharcnet/openmpi/1.6.2/intel/bin/mpirun -np 4 --machinefile $PBS_NODEFILE ./mpiRun2
/opt/sharcnet/openmpi/1.6.2/intel/bin/mpirun -np 4 --machinefile $PBS_NODEFILE ./mpiRun3

Then you can submit it with:

sqsub -r 7d -q mpi -n 4 --nompirun -o outputFile ./start.sh

Your mpi runs (mpiRun1, mpiRun2, mpiRun3) will run one at a time, using all available processors within the job's allocation, i.e. whatever you specify with the -n option in sqsub. Please be aware of the total execution time for all runs, as with a large number of jobs it can easily exceed the maximum allowed 7 days, in which case the remaining runs will never start.

With the second method, your script would contain sqsub inside it. This approach is described in Serial / parallel farming (or throughput computing).

How can I have a quick test run of my program?

Debugging and development often require the ability to quickly test your program repeatedly. At SHARCNET we facilitate this work by providing a pre-emptive testing queue on some of our clusters, and a set of interactive development nodes on the larger clusters.

The test queue is highly recommended for most test cases as it is convenient and prepares one for eventually working in the production environment. Unfortunately the test queue is only available on Requin, Goblin and Kraken. Development nodes allow users to work interactively with their program outside of the job scheduling system and production environment, but we only set aside a limited number of them on the larger clusters. The rest of this section will only address the test queue, for more information on development nodes see the Kraken, Orca or Saw cluster pages.

The test queue allows one to quickly test their program in the job environment to ensure that the job will start properly, and can be useful for debugging. It also has the benefit that it will allow you to debug any size of job. Do not abuse the test queue as it will have an impact on your fairshare job scheduling priority and it has to interrupt other user's production jobs temporarily, slowing down other users of the system.

Note that the flag for submitting to the test queue is provided in addition to the regular queue selection flag. If you are submitting a MPI job to the test queue, both -q mpi and -t should be provided. If you omit the -q flag, you may get odd errors about libraries not being found, as without knowing the type of job, the system simply doesn't know how to start your program correctly.

To perform a test run, use sqsub option --test or -t. For example, if you have an MPI program mytest that uses 8 processors, you may use the following command

sqsub --test -q mpi -n 8 -o mytest.log ./mytest

The only difference here is the addition of the "--test" flag (note -q appears as would be normal for the job). The scheduler will normally start such test jobs within a few seconds.

The main purpose of the test queue is quickly verify the startup of a changed job - just to test that for a real, production run, it won't hit a bug shortly after starting due to, for instance, missing parameters.

The "test queue" only allows a job to run for a short period of time (currently 1 hour), therefore you must make sure that your test run will not take longer than this to finish. Only one test job may be run at a time. In addition, the system monitors the user submissions and decreases the priority of submitted jobs over time within an internally defined time window. Hence if you keep submitting jobs as test runs, the waiting time before those jobs get started will be getting longer, or you will not be able to submit test jobs any more. Test jobs are treated as "costing" four times as much as normal jobs.

I can't run jobs because I'm overquota?

If you exceed your /work disk quota on our systems you will be placed into a special "overquota" group and will be unable to run jobs. SHARCNET's disk monitoring system runs periodically (typically O(day)) so if you have just cleaned up your files you may have to wait until it runs again to update your quota status. One can see their current quota status from the system's point of view by running:

 quota $USER

If you can't submit jobs even after the system has updated your status it is likely because you are logged into an old shell which still shows you in the overquota unix group. Log out and back in again and then you should be able to submit jobs.

If you're cleaning up and not sure how much space you are using on a particular filesystem, then you will want to use the du command, eg.

 du -h --max-depth=1 /work/$USER

This will count space used by each directory in /work/$USER and the total space, and present it in a human-readable format.

For more detailed information please see the Using Storage article.

How long will it take for my queued job to start?

In practice, if your potential job does not cause you to exceed your user certification per-user process limit and there are enough free resources to satisfy the processor and memory layout you've requested for your job, and no one else has any jobs queued, then you should expect your jobs to start immediately. Once there are more jobs queued than available resources, the scheduler will attempt to arbitrate between the resource (CPU, memory, walltime) demands of all queued jobs. This arbitration happens in the following order: Dedicated Resource jobs first, then "test" jobs (which may also preempt normal jobs), and finally normal jobs. Within the set of pending normal jobs, the scheduler will prefer jobs belonging to groups which have high Fairshare priority (see below).

For information on expected queue wait times, users can check the Recent Cluster Statistics table in the web portal. This is historical data and may not correspond to the current job load on the cluster, but it is useful for identifying longer-term trends. The idea is that if you are waiting unduly long on a particular cluster for your jobs to start, you may be able to find another similar cluster where the waittime is shorter.

Although it is not possible to predict the start time of queued job with much accuracy there are some tools that can be used while logged into the systems that can help estimate a relevant wait time range for your specific jobs.

First of all it is important to gather information about the current state of the scheduling queue. By exploring the currently running and queued jobs in the queue you can get a general picture of how busy the system is. With these tools it is also possible to get a more specific picture of queue times for jobs that are similar to your jobs in terms of resource requests. Because the resource requests of a job play a major role in dictating its wait time it is important to base queue time estimates on jobs that have similar requests.

The program showq can be used to view the jobs that are currently running and queued on many system:

$ showq

active jobs------------------------
JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME
... 

For more detailed information about the queued jobs use

$ showq -i

eligible jobs----------------------
JOBID                 PRIORITY  XFACTOR  Q  USERNAME    GROUP  PROCS     WCLIMIT     CLASS      SYSTEMQUEUETIME
...

A more general listing of queue information can also be obtained using qstat as follows:

$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
...


Once that the queue has been explored, further details about specific jobs can be obtained to provide more information in the task of estimating queue time. In many instances it is useful to filter the jobs displayed to only show jobs with specific characteristics that relate to a job type of interest. For instance, all of the queued mpi jobs can be listed by calling:

$ sqjobs -aaq --queue mpi
  jobid     user queue state ncpus   time command
------- -------- ----- ----- ----- ------ -------
...

Note that the --queue option to sqjobs , beyond filtering to the standard serial, threaded, mpi and gpu queues, can also filter the output for jobs in specific NRAP queues. This can be particularly important information in the task of managing use within resource allocation projects.

Once that specific jobs have been identified in the queue that share resource requests with the type of job that you would like to get queue time estimates for (e.g. 32 process mpi job) you can obtain more details about specific jobs by calling:

$ sqjobs -l [jobid]
key                value
------------------ -----
jobid:             ...
queue:             ...
ncpus:             ...
nodes:             ...
command:           ...
working directory: ...
out file:          ...
state:             ...
submitted:         ...
started:           ...
should end:        ...
elapsed:           ...
cpu time:          ...
virtual memory:    ...
real/virt mem:     ...
 

  jobid     user queue state ncpus   time command
------- -------- ----- ----- ----- ------ -------
...

... or further calling:

$ qstat -f [jobid]
Job Id: ...
   Job_Name = ...
   Job_Owner = ...
   resources_used.cput = ...
   resources_used.mem = ...
   resources_used.vmem = ...
   resources_used.walltime = ...
   job_state = ...
   queue = ...
   server = ...
   Account_Name = ...
   Checkpoint = ..
   ctime = ...
   Error_Path = ...
   exec_host = ...
   Hold_Types = ...
   Join_Path = ...
   Keep_Files = ...
   Mail_Points = ...
   mtime = ...
   Output_Path = ...
   Priority = ...
   qtime = ...
   Rerunable = ...
   Resource_List.cput = ...
   Resource_List.procs = ...
   Resource_List.pvmem = ...
   Resource_List.walltime = ...
   session_id = ...
   Shell_Path_List = ...
   etime = ...
   submit_args = ...
   start_time = ...
   Walltime.Remaining = ...
   start_count = ...
   fault_tolerant = ...
   submit_host = ...
   init_work_dir = ...


Even though there is rich information to use from the scheduling queue to use towards building estimates of future job wait times there is no way estimate queue wait times with certainty as the scheduling queue if a very dynamic process in which influential properties change on every scheduling cycle. Further there are many parameters to consider not only of the jobs currently queued and running but also on thr priority ranking of the submitting user and group.

Another way to minimize your queue waittime is to submit smaller jobs. Typically it is harder for the scheduler to free up resources for larger jobs (in terms of number of cpus, number of nodes, and memory per process), and as such smaller jobs do not wait as long in the queue. The best approach is to measure the scaling efficiency of your code to find the sweet spot where your job finishes in a reasonable amount of time but waits for the least amount of time in the queue. Please see this tutorial for more information on parallel scaling performance and how to measure it effectively.

What determines my job priority relative to other groups?

Fairshare is based on a measure of recent (currently, past 2 months) resource usage. All user groups are ranked into 5 priority levels, with the heaviest users given lowest priority. You can examine your group's recent usage and priority here: Research Group's Usage and Priority.

This system exists to allow for new and/or light users to get their jobs running without having to wait in the queue while more resource consuming groups monopolize the systems.

My job cannot allocate memory

If you did not specify the amount of memory your job needs when you submitted the job, resubmit the job specifying the amount of memory it needs.

If you specified the amount of memory your job needed when it was submitted, then the memory requested was completely consumed. Resubmit your job with a larger memory request. (If this exceeds the available memory desired, then you will have to make your job use less memory.)

The default memory is usually 2G on most clusters. If your job requires more memory and is failing with a message "Cannot allocate memory", you should try adding the ""--mpp=4g" flag to your sqsub command, with the value (in this case 4g - 4 gigabytes) set large enough to accommodate your job.

Memory is a limited resource, so jobs requesting more memory will likely wait longer in the queue before running. Hence, it is to the user's advantage to provide an accurate estimate of the memory needed.

Let us say your matlab program is called main.exe, and that you'd like to log your output in main.out ; to submit this job for 5 hours you'd use sqsub like:

sqsub -o main.out -r 5h ./main.exe

By default it will be attributed an amount of memory dependent on which system you are using (1GB on orca). To increase the amount of memory to 2GB, for example, add "--mpp=2G":

sqsub --mpp=2G -o main.out -r 5h ./main.exe

If that still doesn't work you can try increasing it further.

Furthermore, you can change the requested memory for a queued job with the command qalter (in this example to 5 GB):

qalter -l pvmem=5160m jobID

where jobID would be replaced by the actual ID of a job.

Where can I find available resources?

The change of status of each system, such as down time, power outage, etc is announced through the following three different channels:

  • Web links under systems. You need to check the web site from time to time in order to catch such public announcements.
  • System notice mailing list. This is the passive way of being informed. You receive the notices in e-mail as soon as they are announced. But some people might feel it is annoying to be informed. Also, such notices may be buried in dozens or hundreds of other e-mail messages in your mail box, hence are easily ignored.
  • SHARCNET RSS broadcasting. A good analogy of RSS is like traffic information on the radio. When you are on a road trip and you want to know what the traffic conditions are ahead, you turn on the car radio, tune-in to a traffic news station and listen to updates periodically. Similarly, if you want to know the status of SHARCNET systems or the latest SHARCNET news, events and workshops, you can turn to RSS feeds on your desktop computer.

The following feeds SHARCNET RSS feeds are available :

The term RSS may stand for Really Simple Syndication, RDF Site Summary, or Rich Site Summary depending on the version. Written in the format of XML, RSS feeds are used by websites to syndicate their content. RSS feeds allow you to read through the news you want, at your own convenience. The messages will show up on you desktop, e.g. using Mozilla Thunderbird, an integrated mail client software, as soon as there is an update. If you have a Gmail, a convenient RSS access option may be Google Reader

Can I have multiple roles ?

Yes, though the behavior may differ depending on which systems (consortia) you are using. For SHARCNET we assume that your primary role indicates the group for which you'd like to associate your account with by default. You can select which of your active roles are primary when you apply for a new role, or from your Compute Canada account profile. At SHARCNET, your unix group membership will map to the group associated with your primary role, and by default jobs will be accrued to your primary role's group.

That being said:

  • You may still change group file ownership to your other role's group, see the chgrp command, and newgrp
  • You can select which group for which you'd like a job to be accrued to with the sqsub -p flag, for example, if you have more than one sponsor and your non-primary sponsor has the username smith, you can attribute your usage to the smith project/accounting group like this:
 sqsub -o foo.log -r 5h -p smith ./foo

In terms of usage policy, you can use storage that is available to either group, but your personal storage will be limited as for any other user (you don't get a double quota for /work/$USER or /home/$USER). You can accrue CPU usage to either group and your job will be impacted by the fair share status of the group for which you submit the job to be accounted towards.

Compiling and Running Programs

How do I compile my programs

To make it easier to compile across all SHARCNET clusters, we provide a generic set of commands:

cc, c++, f77, f90, f95

and for MPI,

mpicc, mpic++, mpiCC, mpif77, mpif90, mpif95

On most of our clusters, what these commands actually invoke is controlled by the modules loaded. The default is the Intel compiler and the corresponding Openmpi library compiled for it. These are very reasonable choices for most programs.

What compilers are available?

For a full listing of all SHARCNET compilers see the Compiler section in the web portal software pages.

The "default" SHARCNET compiler is the Intel compiler. It is installed on all of our systems and its module is loaded by default. To identify which compiler is the default, execute the command "module list".

Generic compiler commands (c++,cc,CC,cxx,f77,f90,f95) are actually aliases which invoke the underlying compiler. To see which compiler is actually called, you would execute:

[ppomorsk@orc-login1:~] c++ --version
c++ (ICC) 12.1.3 20120212
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

To see which compiler executable the alias points to, you would do:

[ppomorsk@orc-login1:~] which c++
/opt/sharcnet/intel/12.1.3/snbin/c++
[ppomorsk@orc-login1:~] ls -l /opt/sharcnet/intel/12.1.3/snbin/c++
lrwxrwxrwx 1 root root 39 Sep 11 11:37 /opt/sharcnet/intel/12.1.3/snbin/c++ -> /opt/sharcnet/intel/12.1.3/icc/bin/icpc
[ppomorsk@orc-login1:~]

The corresponding MPI compiler commands (mpic++,mpicc,mpiCC,mpicxx,mpif77,mpif90,mpif95) are also available. What these are set to depends on the openmpi module loaded. When compiling MPI code, it is important that the openmpi module and compiler module match.

If you want to try another compiler, you should load the relevant module, after unloading the intel module. For example, to use open64 you would first unload the default Intel compiler module and then load the module for the desired version of open64.

module unload intel
module load open64/4.5.2

In general, you should choose to use the highest performance compilers. In the past GNU compilers have generally offered inferior performance in comparison to commercial compilers, but recently they have improved. If one plans to do extensive computations, it is advisable to compile code with different compilers and compare performance, then select the best compiler.

What standard (eg. math) libraries are available?

For a full listing of all SHARCNET software libraries see the Library section in the web portal software pages.

If you need to use blas or lapack routines you should consider using the ACML libraries and pathscale compilers on Opteron systems and MKL and intel compilers on on intel hardware. ACML and MKL are vendor optimized libraries including blas and lapack routines. Refer to the ACML and MKL software pages for examples on their use.

Relocation overflow and/or truncated to fit errors

If you get "relocation overflow" and/or "relocation truncated to fit" errors when you compile big fortran 77 codes using pathf90 and/or ifort, then you should try the following:

(A) If the static data structures in your fortran 77 program are greater than 2GB you should try specifying the option -mcmodel=medium in your pathf90 or ifort command.

(B) Try running the code on a different system which has more memory:

   Other clusters that you can try are: requin or hound 

You would probably benefit from looking at the listing of all of the clusters:

https://www.sharcnet.ca/my/systems

and this page has a table showing how busy each one is:

https://www.sharcnet.ca/my/perf_cluster/cur_perf

How do I run a program?

In general, users are expected to run their jobs in "batch mode". That is, one submits a job -- the application problem -- to a queue through a batch queue command, the scheduler schedules the job to run at a later time and sends the results back once the program is finished.

In particular, one will use sqsub command (see What is the batch job scheduling environment SQ? below) to launch a serial job foo

sqsub -o foo.log -r 5h ./foo

This means to submit the command foo as a job with a 5 hour runtime limit and put its standard output into a file foo.log (note that it is important to not put too tight of a runtime limit on your job as it may sometimes run slower than expected due to interference from other jobs).

If your program takes command line arguments, place the arguments after your program name just as when you run the program interactively

sqsub -o foo.log -r 5h ./foo arg1 arg2...

For example, suppose your program takes command line options -i input and -o output for input and output files respectively, they will be treated as the arguments of your program, not the options of sqsub, as long as they appear after your program in your sqsub command

sqsub -o foo.log -r 5h ./foo -i input.dat -o output.dat

If you have more than one sponsor and your non-primary sponsor has the username smith, you can attribute your usage to the smith project/accounting group like this:

sqsub -o foo.log -r 5h -p smith ./foo

To launch a parallel job foo_p

sqsub -q mpi -n num_cpus -o foo_p.log -r 5h ./foo_p

The basic queues on SHARCNET are:

queue usage
serial for serial jobs
mpi for parallel jobs using the MPI library
threaded for threaded jobs using OpenMP or POSIX threads

To see the status of submitted jobs, use command sqjobs.

How do I run a program interactively?

Several of the clusters now provide a collection of development nodes that can be used for this purpose. An interactively session can also be started by submitting a screen -D -m bash command as a job. If your job is a serial job, the submission line should be

sqsub -q serial -r <RUNTIME> -o /dev/null screen -D -fn -m bash

Once the job begins running, figure out what compute node it has launched on

sqjobs

and then ssh to this node and attach to the running screen session

ssh -t <NODE> screen -r

You can access screens options via the ctrl+a key stroke. Some examples are ctrl+a ? to bring up help and ctrl+a a to send a ctrl+a. See the screen man page (man screen) for more information. The message Suddenly the Dungeon collapses!! - You die... is screen's way of telling you it is being killed by the scheduler (most likely because the time you specified for the job has elapsed). The exit command will terminate the session.

If your jobs is a MPI job, the submission line should be

sqsub -q mpi --nompirun -n <NODES> -r 

Once the job starts, the screen sessions will be launch screen on the rank zero node. This may not be the lowest number node allocated, so you have to run

qstat -f -l <JOBID> | egrep exec_host

to find out what node it is (the first one listed). You can then proceed as in the non-mpi case. The command pbsdsh -o <COMMAND> can be used to run commands on all the allocated nodes (see the man pbsdsh), and the command mpirun <COMMAND> can be used to start MPI programs on the nodes.


How do I submit an MPI job such that it doesn't automatically execute mpirun?

This can be done by using the --nompirun flag when submitting your job with sqsub. By default, MPI jobs submitted via sqsub -q mpi are expected to be MPI programs, and the system automatically launches your program with mpirun. While this is convenient in most cases, some users may want to implement pre or post processing for their jobs, in which case they may want to encapsulate their MPI job in a shell script.

Using --nompirun means that you have to take responsibility for providing the correct MPI launch mechanism, which depends on the scheduler as well as the MPI library in use. You can actually see what the system default is by running sqsub -vd ....

system mpi launch prefix
most /opt/sharcnet/openmpi/VERSION/COMPILER/bin/mpirun

NOTE: VERSION is the version number, COMPILER is the compiler used to compile the library, eg. /opt/sharcnet/openmpi/1.6.2/intel/bin/mpirun

Our legacy systems (eg. Goblin, Monk and others) are based on Centos, Torque/Maui/Moab and OpenMPI.

The basic idea is that you'd write a shell script (eg. named mpi_job_wrapper.x) to do some actions surrounding your actual MPI job (using requin as an example here):

#!/bin/bash
echo "hello this could be any pre-processing commands"
/opt/hpmpi/bin/mpirun -srun ./mpi_job.x
echo "hello this could be any post-processing commands"

You would then make this script executable with:

chmod +x mpi_job_wrapper.x

and submit this to run on 4 cpus for 7 days with job output sent to wrapper_job.out:

sqsub -r 7d -q mpi -n 4 --nompirun -o wrapper_job.out ./mpi_job_wrapper.x

now you would see the following output in ./wrapper_job.out:

hello this could be any pre-processing commands
<any output from the MPI job>
hello this could be any post-processing commands

On newer clusters, due to the extreme spread of memory and cores across sockets/dies, getting good performance requires binding your processes to cores so they don't wander away for the local resource they start using. The mpirun flags --bind-to-core and --cpus-per-proc are for this. If sqsub -vd ... shows these flags, make sure to duplicate them in your own scripts. If it does not show them, do not use them. They require special scheduler support, and without this, your process will windup bound to cores other jobs are using

There are a number of reasons NOT to use your own scripts as well: with --nompirun, your job will have allocated a number of cpus, but the non-MPI portions of your script will run serially. This wastes cycles on all but one of the processors - a serious concern for long serial sections and/or jobs with many cpus. "sqsub --waitfor" provides a potentially more efficient mechanism for chaining jobs together, since it permits a hypothetical serial post-processing step to allocate only a single CPU.

But this also brings up another use-case: your --nompirun script might also consist of multiple MPI sub-jobs. For instance, you may have chosen to break up your workflow into two separate MPI programs, and want to run them successively. You can do this with such a script, including possible adjustments, perhaps to output files, between the two MPI programs. Some of our users have done iterative MPI jobs this way, were an MPI program is run, then its outputs massaged or adjusted, and the MPI program run again. Strictly speaking, you can do whatever you want with the resources you allocate as part of a job - multiple MPI subjobs, serial sections, etc.

Some cases need to know the allocated node names and the number of cpus on the node in order to construct its own hostfile or so. This is possible by using '$LSB_MCPU_HOSTS' environment variable. You may insert lines below into your bash script

echo $LSB_MCPU_HOSTS 
arr=($LSB_MCPU_HOSTS)
echo "Hostname= ${arr[0]}"
echo "# of cpus= ${arr[1]}"

Then, you may see

bru2 4
Hostname= bru2
# of cpus= 4

in your output file. Utilizing this, you can construct your own hostfile whenever you submit your job.

The following example shows a job wrapper script (eg. ./mpi_job_wrapper.x ) that translates an LSF job layout to an OpenMPI hostfile, and launches the job on the nodes in a round robin fashion:

 #!/bin/bash
 echo 'hosts:' $LSB_MCPU_HOSTS
 arr=($LSB_MCPU_HOSTS)
 if [ -e ./hostfile.$$ ]
 then
                 rm -f ./hostfile.$$
 fi
 for (( i = 0 ; i < ${#arr[@]}-1 ; i=i+2 ))
 do
                 echo ${arr[$i]} slots=${arr[$i+1]} >> ./hostfile.$$
 done
 /opt/sharcnet/openmpi/current/intel/bin/mpirun -np 2 -hostfile ./hostfile.$$ -bynode ./a.out

Note that one would still have to set the desired number of process in the final line (in this case it's only set to 2). This could serve as a framework for developing more complicated job wrapper scripts for OpenMPI on the XC systems.

If you are having issues with using --nompirun we recommend that you submit a problem ticket so that staff can help you figure out how it should be utilized on the particular system you are using.

How do I checkpoint/restart my program?

Assuming that the code is serial or multi-threaded (not MPI), you can use Berkeley Labs Checkpoint Restart software (BLCR) on legacy SHARCNET systems. Documentation and usage instructions can be found on SHARCNET's BLCR software page. Note that BLCR requires your program to use shared libraries (not be statically compiled).

I can't run 'java' on SHARCNET cluster?

Due to the way memory limits are implemented on the clusters, you will need to be specifying the maximum memory allocation pool for the Java JVM at the time you invoke it.

You do this with the -XmxNNNN command-line argument, where NNNN is the desired size of the allocation pool. Note that this number should always be within any memory limits being imposed by the scheduler (on orca compute nodes, that default limit would be 1GB per process).

The login nodes are explicitly limited to 1GB of allocation for any process, so you will need to run java or javac specifying a maximum memory pool smaller than 1GB. For example:

Running it normally (as in your example, I get same error):

orc-login2:~% java 
Error occurred during initialization of VM 
Could not reserve enough space for object heap 
Could not create the Java virtual machine.

Specify small maximum memory allocation:

orc-login2:~% java -Xmx512m 
Usage: java [-options] class [args...] 
                      (to execute a class) 
      or java [-options] -jar jarfile [args...] 
                      (to execute a jar file)

where options include:

       -d32 use a 32-bit data model if available 
       ...

As you can see, explicitly limiting the memory allocation pool to 512MB here has it running as expected.

Software

For legacy SHARCNET systems the list of preinstalled packages (with running instructions) can be found on the SHARCNET software page.


Legacy SHARCNET account

Do I need a SHARCNET account?

It is very likely that you don't need one - check this info.

What is required to obtain a SHARCNET account

Anyone who would like to use SHARCNET may apply for an account. Please bear in mind the following:

  • There are no shared/group accounts, each person who uses SHARCNET requires their own account and must not share their password
  • Applicants who are not faculty (eg. students, postdocs) require an account sponsor who must already have a SHARCNET account. This is typically one's supervisor.
  • There is no fee for academic access, but account sponsors are responsible for reporting their research activities to Compute Canada, and all academic SHARCNET users must obtain a Compute Canada account before they may apply for a SHARCNET account.
  • All SHARCNET users must read and follow the policies listed here

How do I apply for an account?

Applying for an account is either done through the Compute Canada Database (for academic users) or by contacting SHARCNET (for non-academic use). Each case is outlined below.

Note: If you have an existing SHARCNET account, do not apply for a new account. Each person should only ever have one account, which can be transferred to new groups, etc. See this knowledge base entry for further details.

Academic Users

Faculty, students, and staff at academic institutions, as well as their research collaborators, can apply for their SHARCNET account as follows:

  1. Acquire a Compute Canada Identifier (CCI) by applying for a Compute Canada account here.
    • NOTE: If you are not faculty (e.g. students, collaborators, research assistants) your sponsor will have to have an account already, and you will need to know their CCRI identifier to complete the application form. Ask them for it as it is considered private information.
    • If you are not at a Canadian institution you can still get a CCI (and hence a SHARCNET account) as long as your sponsor has a CCI.
      • In this case specify Institution: Other... in the Compute Canada account application form.
  2. You will be sent a message to confirm your email address.
    • Check your spam folder if you don't see it.
    • Click on the link in the message to confirm.
  3. Once you confirm your email address, an authorizing authority will be sent an email requesting that they confirm your application.
    • If you are applying for a Faculty position account from a SHARCNET member institution then your SHARCNET Site Leader will authorize your account.
    • If your sponsor is from a SHARCNET member institution they will authorize your account.
      • Sponsors may need to check their spam folder to find the confirmation request.
    • If your account sponsor is not located at a SHARCNET member institution your local consortium in Compute Canada will approve your application based on their policy.
  4. Once your Compute Canada account is approved, you should either log back in to the Compute Canada Database where you originally applied for an account, and continue these instructions, or, if you previously held a SHARCNET account, please email help@sharcnet.ca so we can reactivate your old account (further information can be found in the instructions below concerning the linking of one's account).
  5. Navigate to the Consortium Accounts Page, it is in the "My Account" menu, listed as Apply for consortium account.
  6. Click the Apply button next to the word SHARCNET.
  7. Follow the instructions on the SHARCNET website to complete the application.
    • If you are not faculty, your sponsor needs a SHARCNET account (in addition to their Compute Canada account) before you can complete the SHARCNET application.
  8. A SHARCNET staff person will review your application before you receive cluster access.

You will be sent an email containing either your new account credentials and information on getting started with SHARCNET or the outcome of your application once it has been processed. Please note that it may take up to 1 business day to process your application once it has been successfully submitted (including all authorizations).

If you are having trouble with the above instructions please contact help@sharcnet.ca for assistance.

Non-academic Users

All other account requests (commercial access, non-academic, ineligible for a CCI, etc.) should be sent to help@sharcnet.ca. These are dealt with on per-case basis and approved following consultation with SHARCNET Administration. If you are working outside of academia we recommend you read our Commercial Access Policy which can be found in the SHARCNET web portal here.

I am changing supervisor or I am becoming faculty, and I already have a SHARCNET account. Should I apply for a new account?

No, you should apply for a new role (CCRI) at Compute Canada and indicate that you want your new role to be your primary role.

The process is as follows:

  1. apply for a new role at the Compute Canada Database under your new supervisor/position
    • Disable old roles as appropriate - by doing so you will not be asked to renew them each year at renewal time, so you can avoid future hassles with deactivating upfront. That being said, you may wish to leave some/all activated if your arrangement in prior roles has not changed and you'd like to continue computing in those groups. If in doubt email accounts@computecanada.ca for direction.
    • Ensure that you click the check-box beside the question Make this role primary?
      • If you forgot to check off the Make this role primary option in the application form, email help@sharcnet.ca and we can update it for you.

Once your new role is confirmed by your sponsor and/or the Compute Canada account authority for your institution, your SHARCNET account will automatically change status/sponsor. Note that for SHARCNET institutions only the sponsor needs to confirm sponsored roles at Compute Canada, but for other institutions in Compute Canada the institutional account authority may also have to approve the role.

Please note the following caveats associated with changing account sponsor at SHARCNET:

Note that for sponsored accounts, your new sponsor needs to have an activated SHARCNET account before your account will be reactivated following the change in position. If your sponsor applies for their SHARCNET account after you've obtained your new role, you (or preferably, they) must email help@sharcnet.ca indicating that you'd like your account reactivated as it does not happen automatically.

file permissions

Your files will retain their old group ownership after the switch (although you may change them to your new group if you wish with the chgrp command).

email contact address

By default the email address associated with your account will not change to the one associated with your new role. If you wish to use your new email address you have to update your Contact Information at the Compute Canada Database to make your new address Primary

I have an existing SHARCNET account and need to link it to a new Compute Canada account, how do I do that?

You first need to get a Compute Canada Role Identifier (CCRI) (steps 1-3) and then notify SHARCNET that you would like to link your Compute Canada Account (CCI) to your existing SHARCNET account.

Important Notes: If you are a sponsored user then your sponsor must complete this process before you can proceed to step 4. SHARCNET cannot give out account credentials, including CCRIs, due to data privacy concerns, so you must obtain this information from your sponsor.

  1. submit a Compute Canada Account Application
    • creating this account will also create a CCRI
    • Note: if you have an account sponsor you will need their CCRI to apply
    • Note the default username field; the form will not let you use your old SHARCNET username - you will have to apply using a different name and then request that it be changed back to your old SHARCNET username as part of step #4 (we'll contact Compute Canada to do this). Alternatively, if you'd like to pick something new, you should ask that we update your username at SHARCNET to be your new Compute Canada default username when emailing us as part of step #4 below.
  2. confirm your email by clicking on the link in the email message you receive
    • you may have to check your spam folder for this message as it is automatically generated
  3. wait for your sponsor or siteleader to approve your account
    • your sponsor may need to check his or her spam folder to find the confirmation email.
    • depending on your local consortium, your consortium may need to approve your account after your sponsor approves it.
    • when approved, you'll receive email indicating your account is active
    • it will also contain your new CCRI
  4. email help@sharcnet.ca to request that your Compute Canada and SHARCNET accounts be linked

After linking your accounts there may be a modest delay before your status is updated in the SHARCNET account system. If you are logged into any clusters you should logout and back in again to update your ldap group membership, otherwise when you submit jobs they may fail with warning messages containing bsub status=65280. After linking accounts, SHARCNET will utilize your primary email address on file with Compute Canada for all communications. If your sponsor or position has changed since you had last accessed your SHARCNET account then you should review the notes in the above section concerning change of sponsor/supervisor.

If you encounter problems please email help@sharcnet.ca.

Using R on SHARCNET

Version Selection

To load the latest r/3.4.3 module do the following steps,. We first unload the intel module since the sharcnet r modules are built with the system gcc compiler.

module unload r intel gcc openmpi
module load r/3.4.3
module load gcc/6.4.0   (optional)

Parallel Processing? Things to Know Before Getting Started

It is important to read this note before you go ahead to use R on SHARCNET, or you will encounter surprises that you wouldn't see otherwise on your personal computer or departmental computers elsewhere. Keep in mind that R on SHARCNET is parallel processing enabled. This means when using some of the R packages, you may not notice that R will run automatically in parallel. While this may sound nice were it running on a personal or department computer that has multicores, so that it can employ the power of parallel processing, on SHARCNET, however, this automatic parallel processing without the consciousness of the user, hence the the scheduler is not aware of the potential parallel behaviour of R, can cause unexpected problems. Please read the Section Using The R Parallel Library.

Job Submission

After loading a r module (as shown above in Version Selection) R jobs can be submitted to the batch queueing system for execution. The following command shows howto submit a R job to the orca serial queue. If no [outfile] name is given then the output filename will be based on the infile name with '.Rout' extension. For further details of command arguments run "man R" to see the R manpage:

sqsub -r 2d -q serial -o ofile.%J R CMD BATCH [options] infile [outfile]

Starting with version 2.14.2 the r installation on sharcnet is threaded lapack aware, hence R codes using functions based on lapack can benefit significantly by submitting to the threaded queue. As an example of submitting an R job to the threaded queue using 8cores one might use:

sqsub --mpp 2G -r 7d -q threaded -n 8 -o ofile.%J R CMD BATCH --no-save mycode.r

Installing Packages

R makes it fairly easy for users to install and maintain their own R package additions, so there is no need to have them installed centrally unless there is substantial demand across the user community. The following instructions will provide some assistance with users installing their own R packages.

  1. Create a directory in your file system to hold your R packages. This could be done under /home however we recommend using global work e.g. mkdir /work/$USER/R_local

  2. Set the R_LIBS environment variable to point at this directory to adds this directory to the search path for libraries when you run R. You can consider adding this export to your ~/.bashrc as a custom configuration e.g. export R_LIBS=/work/$USER/R_local Running the libPaths command in R should generate something like the following depending on which version of r module you loaded:
    [roberpj@red-admin:~] R
    > .libPaths()
    [[1]] "/work/$USER/R_local"
    [[2]] "/opt/sharcnet/r/3.1.1/lib64/R/library"
    
    Now R is aware of both the system install directory, and this new one you have the ability to write to.

  3. You can now install R packages by either downloading the package, placing it in your "R_local" directory and executing the following at the system command-line (not from within R):
    % R CMD INSTALL ./name_of_downloaded.package

    Most packages allow you to install packages directly from within R. For example, the statnet package instructions you to install using the following command from within R:

     > install.packages("updatestatnet",dependencies=TRUE,contriburl="http://cran.us.r-project.org")

    The effect is the same - the package will be downloaded and will default to being installed in your local R directory---it is possible to explicitly specify the lib directory using this command if desired).

    At this time, you may receive a warning that R was unable to create an HTML package index, however this is unrelated to the function of the package --- I'm still looking to see if there is another environment variable I'm missing in order to have this relocated to your environment properly; however, this shouldn't affect operation of the software.

  4. If the R package your installing exceeds the memory or time limits of a cluster login node, then a sharcnet devel node should be used to build your package (such as orc-dev1 on orca). Note however, as these nodes do not have direct internet access, a form of the Sys.setenv R command must be run before running your install.packages command if access to an external url for downloading source packages is required: Assuming your downloading from a https repository run:
     > Sys.setenv(https_proxy="http://proxy.sharcnet.ca:3128")

    If your downloading from a http respository mirror then run:

     > Sys.setenv(http_proxy="http://proxy.sharcnet.ca:3128")

    If you get the message connect: Network is unreachable after running the Sys.setenv then it will be necessary to stop R then restart R to run the corrected Sys.setenv command. Otherwise rerunning different Sys.setenv commands in an attempt to fix the problem will continue to fail even if the correct command is run. In a converse sense, once a package is successfully downloaded rerunning a incorrect Sys.setenv command will still result in the apparent success of install.packages since the package already resides in its temp directory from the current session such as:

    The downloaded source packages are in
            ‘/local/tmp/RtmpbmwYdR/downloaded_packages’
    
  5. At this point you have installed your package; follow any additional instructions that may be provided by the package maintainer (such as running an update, etc.) If you should experience any problems installing your package, please contact one of the software analysts (or file a problem ticket) for assistance.

General Notes

Quick Numerical Accuracy Check

[roberpj@orc-login1:~] R
R version 2.10.0 (2009-10-26)
> log(exp(1),base=2)
[1] 1.442695

Check if R package is Installed

If package abc is installed under one of the libPath locations. Either TRUE or FALSE will be returned.

[roberpj@orc-login1:~] R
> .libPaths()
[1] "/gwork/roberpj/R_local/el6"           
[2] "/opt/sharcnet/r/3.2.3/lib64/R/library"
> "abc" %in% rownames(installed.packages())
[1] TRUE

Removing the package then checking will result in FALSE being returned:

> remove.packages(c("gsl"),lib=file.path("/gwork/roberpj/R_local/el6"))
> "gsl" %in% rownames(installed.packages())
[1] FALSE

See also https://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Removing-packages.

List All Installed R Package Versions

> ip <- as.data.frame(installed.packages()[,c(1,3:4)]);rownames(ip) <- NULL;
> ip <- ip[is.na(ip$Priority),1:2,drop=FALSE];print(ip, row.names=FALSE);
       Package   Version
    assertthat     0.2.0
            BH  1.66.0-1
         bindr       0.1
      bindrcpp       0.2
        bitops     1.0-6
 BradleyTerry2     1.0-8
         brglm     0.6.1
         broom     0.4.3
         caret    6.0-78
       caTools    1.17.1

Check Version of Specific R package

> library("caret")
Loading required package: lattice
Loading required package: ggplot2
> packageVersion("caret")
[1] ‘6.0.78’

Specify Default Repo Location

When installing package abc within R you may get the following failure message:

[roberpj@orc-login1:~] module load r/3.2.3
[roberpj@orc-login1:~] R
> install.packages("abc")
"package ‘abc’ is not available (for R version 3.2.3) 

Assuming the package is available from cran, specify a mirror from https://cran.r-project.org/mirrors.html then install the package by doing for example:

> install.packages("abc", repos="http://cran.us.r-project.org")

To make cran default create ~/.Rprofile as explained in example(Startup) and then install.packages("abc") should succeed as follows:

[roberpj@orc-login1:~] cat .Rprofile 
## Set default repo
local({r <- getOption("repos");
       r["CRAN"] <- "http://cran.us.r-project.org"; 
       options(repos=r)})

... at the command line run:

 Rscript -e 'install.packages("abc")'

Specify Dependency Package Locations

When installing a R package whose build depends on finding files for a locally installed sharcnet module package or any other manually installed package, one will likely need to manually specify its location. For example, when installing the package "ncdf" interactively in R, one will either get an error complaining that "netcdf.h" cannot be found, or possibly older system version is found by default instead and the installation fails later since the package is out of date. A work around for this "ncdf" example would be as follows:

1) In your home directory $HOME, create a .R directory, if it does not exist:

cd
mkdir .R

2) In $HOME/.R, create a file Makevars, put the following two lines:

CPPFLAGS += -I/opt/sharcnet/netcdf/4.1.1/intel/include
LDFLAGS += -L/opt/sharcnet/netcdf/4.1.1/intel/lib

3) Execute the following command to load the latest R:

module switch r

4) Invoke R from command line and then within R, run

> install.packages("ncdf")

Further details are given in https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Customizing-package-compilation.

Plot Generation Example

Please note this demo is just for example purpose. The unicode character included in the plot title is not yet supported.

[roberpj@laptop:~] ssh -X brown.sharcnet.ca
[roberpj@bro-login:~] module unload r intel gcc
[roberpj@bro-login:~] module load r/2.15.3
[roberpj@bro-login:~] R
> X11()
> plot(1:5,1:5)
> title(main="Plot Title \u{1D62}")
> dev.copy(png)
png 
  3 
> q()

[roberpj@bro-login:~] display Rplot001.png

Automatically Setting R_LIBS

Clusters guppy, mako and kraken still run the older rhel5 operating system. This means that any third party packages one installs on these systems using R will be built with gcc 4.1.2 instead of gcc 4.4.7 on rhel6 based clusters such as orca. Soon sharcnet will add a centos7 cluster dusky. To manage R_LIBS automatically amongst so many operating systems do the following:

o First create subdirectories under R_local with names corresponding to the operating system versions currently running on sharcnet clusters and workstations:

mkdir -p /work/$USER/R_local/{el5,el6,el7,fc20,fc23}

o Second copy-paste the following stanza into your ~/.bashrc file:

if   cat "/etc/redhat-release" | grep 5. >& /dev/null; then
   export R_LIBS=/work/$USER/R_local/el5
elif cat "/etc/redhat-release" | grep 6. >& /dev/null; then
   export R_LIBS=/work/$USER/R_local/el6
elif cat "/etc/redhat-release" | grep 7. >& /dev/null; then
   export R_LIBS=/work/$USER/R_local/el7
elif cat "/etc/redhat-release" | grep 20 >& /dev/null; then
   export R_LIBS=/work/$USER/R_local/fc20
elif cat "/etc/redhat-release" | grep 23 >& /dev/null; then
   export R_LIBS=/work/$USER/R_local/fc23
fi
export R_LIBS_USER=$R_LIBS

# echo statements in bashr can cause problems with ssh or sftp commands
# though very convenient uncomment the next line and use at your own risk
#[ -t 0 ] && echo "R_LIBS = "$R_LIBS && echo "R_LIBS_USER = "$R_LIBS_USER && echo

How to Utilize SHARCNET Modules

This section shows how to utilize a more recent library version of mpfr and gmp which are available as sharcnet modules (instead of using older system rpm versions) to install R package https://cran.r-project.org/web/packages/MixGHD/index.html using a ~/.R/Makevars file. Since there is currently no automated way to load the mpfr and gmp sharcnet modules and transpose there LDFLAGS settings into the required Rpath formulation, we manually create the entries in Makevars as shown.

If the steps below are followed exactly then MixGHD and its 12 dependency packages should automatically build and install into R_LIBS=/work/$USER/R_local/el6 for use on sharcnet clusters. For purely demonstration purposes the R_LIBS directory is flushed in step3. An artifact of the approach requires the user to keep track of which modules each package is built with; To ease this burden one can could for example create a R_LIBS=/work/$USER/R_local/el6_mpfr-3.2.0_gmp_6.0.0 directory and similarly under ~/.R create a link likewise named "ln -s Makevars_mpfr-3.2.0_gmp_6.0.0 Makevars". Such book keeping measures are left to the users discretion and not shown in the following example:

Step 1)   Log into a sharcnet cluster such as orca.

Step 2)
[roberpj@orc-login2:~] module unload intel; module load r/3.2.0;  echo $R_LIBS
/work/roberpj/R_local/el6

Step 3) 
[roberpj@orc-login1:/gwork/roberpj/R_local/el6] /bin/rm -rf *

Step 4)
[roberpj@orc-login1:~] cat .R/Makevars
PKG_CPPFLAGS = -I/opt/sharcnet/mpfr/3.1.2p8/include -I/opt/sharcnet/gmp/6.0.0/include
PKG_LIBS= -L/opt/sharcnet/gmp/6.0.0/lib -Wl,-rpath,/opt/sharcnet/gmp/6.0.0/lib -L/opt/sharcnet/mpfr/3.1.2p8/lib -Wl,-rpath,/opt/sharcnet/mpfr/3.1.2p8/lib -lgmp -lmpfr

Step 5)
[roberpj@orc-login2:~/samples/r/MixGHD] R 2>&1 | tee build.log
> install.packages("MixGHD", dependencies = TRUE)
Using: Canada (QC1)
~~ snip ~~ 
* installing *source* package MixGHD ...
** package MixGHD successfully unpacked and MD5 sums checked
** R
** data
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (MixGHD)

Where the following snippets indicate some details of what was done ...


[roberpj@orc-login2:~/samples/r/MixGHD] cat build.log | grep DONE
* DONE (bitops)
* DONE (gtools)
* DONE (mvtnorm)
* DONE (numDeriv)
* DONE (mixture)
* DONE (e1071)
* DONE (gdata)
* DONE (caTools)
* DONE (Rmpfr)
* DONE (gplots)
* DONE (Bessel)
* DONE (ghyp)
* DONE (MixGHD)

[roberpj@orc-login2:~/samples/r/MixGHD] cat build.log
gcc -std=gnu99 -I/opt/sharcnet/r/3.2.0/lib64/R/include -DNDEBUG -I/opt/sharcnet/mpfr/3.1.2p8/include -I/opt/sharcnet/gmp/6.0.0/include
-I/opt/sharcnet/mkl/10.3.9/mkl/include    -fpic  -O2  -c Ops.c -o Ops.o
gcc -std=gnu99 -I/opt/sharcnet/r/3.2.0/lib64/R/include -DNDEBUG -I/opt/sharcnet/mpfr/3.1.2p8/include -I/opt/sharcnet/gmp/6.0.0/include
-I/opt/sharcnet/mkl/10.3.9/mkl/include    -fpic  -O2  -c Summary.c -o Summary.o
gcc -std=gnu99 -I/opt/sharcnet/r/3.2.0/lib64/R/include -DNDEBUG -I/opt/sharcnet/mpfr/3.1.2p8/include -I/opt/sharcnet/gmp/6.0.0/include
-I/opt/sharcnet/mkl/10.3.9/mkl/include    -fpic  -O2  -c convert.c -o convert.o
gcc -std=gnu99 -I/opt/sharcnet/r/3.2.0/lib64/R/include -DNDEBUG -I/opt/sharcnet/mpfr/3.1.2p8/include -I/opt/sharcnet/gmp/6.0.0/include
-I/opt/sharcnet/mkl/10.3.9/mkl/include    -fpic  -O2  -c init.c -o init.o
gcc -std=gnu99 -I/opt/sharcnet/r/3.2.0/lib64/R/include -DNDEBUG -I/opt/sharcnet/mpfr/3.1.2p8/include -I/opt/sharcnet/gmp/6.0.0/include
-I/opt/sharcnet/mkl/10.3.9/mkl/include    -fpic  -O2  -c utils.c -o utils.o
gcc -std=gnu99 -shared -L/opt/sharcnet/r/3.2.0/lib64/R/lib -L/opt/sharcnet/mkl/10.3.9/mkl/lib/intel64
-Wl,-rpath,/opt/sharcnet/mkl/10.3.9/mkl/lib/intel64 -o Rmpfr.so Ops.o Summary.o convert.o init.o utils.o -L/opt/sharcnet/gmp/6.0.0/lib
-Wl,-rpath,/opt/sharcnet/gmp/6.0.0/lib -L/opt/sharcnet/mpfr/3.1.2p8/lib -Wl,-rpath,/opt/sharcnet/mpfr/3.1.2p8/lib -lgmp -lmpfr
-L/opt/sharcnet/r/3.2.0/lib64/R/lib -lR
installing to /gwork/roberpj/R_local/el6/Rmpfr/libs

[roberpj@orc-login1:/gwork/roberpj/R_local/el6] ls
Bessel  bitops  caTools  e1071  gdata  ghyp  gmp  gplots  gtools  MixGHD  mixture  mvtnorm  numDeriv  Rmpfr


> .libPaths()
[1] "/gwork/roberpj/R_local/el6"
[2] "/opt/sharcnet/r/3.2.0/lib64/R/library"
> library()
> search()
[1] ".GlobalEnv"        "package:stats"     "package:graphics"
[4] "package:grDevices" "package:utils"     "package:datasets"
[7] "package:methods"   "Autoloads"         "package:base"
> library()
Packages in library /gwork/roberpj/R_local/el6:

Bessel                  Bessel -- Bessel Functions Computations and
                        Approximations
bitops                  Bitwise Operations
caTools                 Tools: moving window statistics, GIF, Base64,
                        ROC AUC, etc.
e1071                   Misc Functions of the Department of Statistics,
                        Probability Theory Group (Formerly: E1071), TU
                        Wien
gdata                   Various R Programming Tools for Data
                        Manipulation
ghyp                    A package on the generalized hyperbolic
                        distribution and its special cases
gmp                     Multiple Precision Arithmetic
gplots                  Various R Programming Tools for Plotting Data
gtools                  Various R Programming Tools
MixGHD                  Model Based Clustering, Classification and
                        Discriminant Analysis Using the Mixture of
                        Generalized Hyperbolic Distributions
mixture                 Mixture Models for Clustering and
                        Classification
mvtnorm                 Multivariate Normal and t Distributions
numDeriv                Accurate Numerical Derivatives
Rmpfr                   R MPFR - Multiple Precision Floating-Point
                        Reliable

Using The R Parallel Library

On SHARCNET, R is currently compiled to use the Intel MKL library, which provides implicit parallelism when doing many kinds of computation. If you are going to use an explicit parallelism library, such as library(parallel), you should turn off MKL because the two mechanisms interfere. Not doing so is known to be problematic on some clusters (mosaic) at the time of this writing. The solution is to set MKL_NUM_THREADS=1 before starting R to force MKL to operate serially. While MKL also respects a OMP_NUM_THREADS=1 setting, this may undesirably affect other R packages. A complete example is shown below that demonstrates expected scaling is achieved when MKL_NUM_THREADS is set. Please remember to unset the variable when not using the parallel library - putting it in your ~/.bashrc file, while acceptable, its therefore not recommended.

[roberpj@mos-login:~] echo $OMP_NUM_THREADS
[roberpj@mos-login:~] export MKL_NUM_THREADS=1
[roberpj@mos-login:~] module unload r intel gcc; module load r/3.2.1
[roberpj@mos-login:~] R
R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut"
>  library(parallel)
>  set.seed(1000)
> test <- lapply(1:10,function(x) rnorm(100000))
> system.time(x <- mclapply(test,function(x) loess.smooth(x,x), mc.cores=1))
   user  system elapsed
  4.898   0.045   4.947
> system.time(x <- mclapply(test,function(x) loess.smooth(x,x), mc.cores=2))
   user  system elapsed
  2.371   0.050   2.420
>  system.time(x <- mclapply(test,function(x) loess.smooth(x,x), mc.cores=4))
   user  system elapsed
  5.865   0.137   1.521
> system.time(x <- mclapply(test,function(x) loess.smooth(x,x), mc.cores=8))
   user  system elapsed
  5.850   0.260   1.091
>
Save workspace image? [y/n/c]: n
[roberpj@mos-login:~]

Using Rmpi on SHARCNET

Rmpi usage is not currently supported on sharcnet. Researchers interested in using Rmpi are recommended to read the instruction by the Rmpi developer Hao Yu. Researchers who wish to contribute build and/or usage instructions to the sharcnet website may do so by visiting the Unsupported Software page and then click Rmpi in the "User-Installed Packages" table, this will take you to a wiki page where you may enter your version speciic information.

More on Parallel Processing with R

The parallel package (formerly multicore) lets R do parallel processing through a number of parallel functions. But the parallel processing capability is not just limited to the use of the package parallel. By default, when certain functions are called, R spawns multiple threads automatically. While this may sound convenient to exploit the power of multicores, this can sometimes result in serious system or performance problems. Take the above example again, if one loads R module and invoke R from the command line, without setting the number of threads in the any of the environment variables, the following result will appear

> set.seed(1000)
> test <- lapply(1:10,function(x) rnorm(100000)) 
> system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
   user  system elapsed 
 62.330   0.036   5.257 

Note the the call library(parallel) is not placed. The user time (accumulated CPU time) is nearly 12 times of the elapsed time, which suggests that lapply() is executed in multiple threads/processes. By setting

export OMP_NUM_THREADS=1

and execute the same commands again, we see the following different timing results

> set.seed(1000)
> test <- lapply(1:10,function(x) rnorm(100000)) 
> system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
   user  system elapsed 
  4.932   0.038   4.974 

Now the user time is consistent with the elapsed time (wall time).

The number of threads can be set through either of the environment variables OMP_NUM_THREADS and MKL_NUM_THREADS. The latter determines whether the underlying math libraries that R uses shall be run in parallel with multithreads. If none of them is set, then the underlying functions will use the default, which is the number of physical processors available on the host.

These two environment variables can be unset with command

unset OMP_NUM_THREADS

and/or

unset MKL_NUM_THREADS

They can also be set to the desired value, for example, in bash shell

export OMP_NUM_THREADS=2

When the number of threads set to two, we run the example above and get the following timing result

> set.seed(1000)
> test <- lapply(1:10,function(x) rnorm(100000)) 
> system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
   user  system elapsed 
  9.720   0.035   4.911 

which suggests that two threads have consumed 9.72 seconds of CPU time, while the elapsed time is 4.911 seconds.

The user must hence be extremely careful when using the parallel of R. With the automatic threading enabled, R will spawn threads (equal to the number of threads set by the environment variables OMP_NUM_THREADS or MKL_NUM_THREADS) wherever applicable. Mixing this default threading feature with the forking of multiple process via e.g. the parallel function mclapply() may cause serious performance and system problems (e.g. the program hangs).

Building Your Own Version Of R

Sharcnet usually has the latest version of r installed as a module. However if for some reason you want to build your own version of R, the following steps are a good starting point:

1. Unload intel compiler module

R depends heavily on system installed packages. We have only built R using GCC and system installed packages. To unload intel module

module unload intel

2. Load GCC compiler

R will build with the system compiler or a newer compiler can be used. To use the latest version of the gcc compiler installed do:

module load gcc
gcc -v

returns gcc version 7.2.0 (GCC).

3. Load R dependency modules:

When configure runs there will be several points of failure due to old versions of packages in the centos6 os. To get around this one can simply module load more recent versions provided by the sharcnet software stack:

module load zlib bzip2 curl xz pcre

where xc provides liblzma.

4. Now, you are ready to compile and install R

Suppose you downloaded and unpackaed R 3.4.1 in ~Downloads, and you want to install your version of R in $HOME/software/R-3.4.1, run the following commands

cd ~/Downloads/R-3.4.1 
./configure --prefix=$HOME/software/R-3.4.1 --enable-R-shlib --enable-threads=posix

This should get you finished with the configuration. Then, run

make 
make install.   That's it!

Storage

What storage is available

The Sharcnet clusters have a variety of storage systems available to users. Which one you place your files into depends on your specific needs. In all cases, user directories are stored as (filesystem)/(userid), where the user "sharcnet" would find their home directory as /home/sharcnet, their work directory as /work/sharcnet, and scratch as /scratch/sharcnet. The only exceptions to this are on the kraken login nodes, where sub-cluster specific scratch directories are stored as /scratch/(subcluster)/(user), thus placing the sharcnet user's scratch directory for whale nodes on /scratch/wha/sharcnet while on kraken login nodes. The kraken compute nodes follow the standard /scratch/(userid) pattern.

Below is a list of the filesystems that are available on the Sharcnet clusters:

  1. /home
    • Space: 10 GB
    • Purpose: Storage of configuration files, and small source code trees
    • Available on: all login and compute nodes, and Visualization machines
    • Quota type: Hard limit - once exceeded, no more files can be written
  2. /scratch
    • Space: Variable depending on cluster
    • Purpose: Temporary storage for fast access to files for data processing. /scratch storage should absolutely not be used as a long term storage location.
    • Available on: all login and compute nodes of an individual cluster have access to that cluster's scratch filesystem. Kraken nodes use the scratch for their local sub-cluster, and all kraken sub-cluster scratches are available on the login nodes. Visualization machines have individual local scratch filesystems.
    • Quota type: Timed expiry - files unchanged for 62 days are automatically removed.
  3. /work
    • Space: Global work (on most clusters) has 1TB, local work (on clusters mako and requin) has 200GB quota
    • Purpose: Long term storage of source code, program code, and data that is being actively used
    • Available on: all login and compute nodes - mako cluster has access only to it's own /work directories, requin uses local /oldwork, and mounts global work as /gwork, and all other clusters and Visualization machines mount global work as /work
    • Quota type: Soft limit - once exceeded, limits on cluster resources are enforced until usage is below limits again
  4. /freezer
    • Space: 2 TB
    • Purpose: Long term storage of data that is not currently being used, but may be needed later
    • Available on: All cluster login nodes
    • Quota type: 2 years expiry
  5. /tmp
    • Space: Small, varies by cluster and node
    • Purpose: Very short term, local data storage during calculations. /tmp files can not be relied on to remain past the end of a job's run.
    • Available on: node-local storage, each node has an independent /tmp drive which is not accessible across the cluster, or on login nodes.
    • Quota type: Periodic purging of /tmp drive between running jobs.

Quota and how it works

Space usage on the home and work filesystems is monitored through a quota system. To see your current usage according to this system, you can use the quota command when logged into a cluster or visualization machine, like this:

[sharcnet@req769:~] quota
Filesystem           Limit       Used            File Count   Checked
jones:/home          10 GB      *11.3 GB (112%)        1,986    12h ago
lundun:/work         1 TB        20.9 MB (0%)        323,313   10h ago

The meanings of the sections of the output are as follows:

Filesystem: Indicates the cluster, and directory on which the user's data in question is stored. The special clusters 'lundun' and 'gulf' represent the global work directories, which are accessible across all of Sharcnet's clusters and visualization machines.

Limit: Indicates the maximum amount of storage space you are allowed to access on the filesystem in question.

Used: Indicates the amount of space currently occupied by your files on the indicated filesystem. Any entry which is over the limit will be marked with a * - in the displayed example above, the sharcnet user is over their /home quota limit.

File Count: Indicates the total number of files contained in your directory on the filesystem in question.

Checked: Indicates how long ago the most recent complete usage check was finished on the indicated filesystem. Quota scans are typically run every 24 hours, starting just after midnight, and depending on which filesystem and cluster can take anywhere from 5 minutes to several hours to complete.

Additionally, if your account has had resource limitations applied to it due to being over quota on a /work filesystem for too long, a warning about this will be displayed before the regular output of the quota command.

As monitoring your quota is a good idea, you may want to add a quota line to your .bashrc file to display your current usage every time you log into a cluster, so that you will become aware of any overages as soon as you log in.

Last, if you are the owner of a Dedicated Resources project, your DR project directory will also be listed in the output from the quota command, and will be labelled as such, like this:

[sharcnet@req769:~] quota
Filesystem               Limit       Used            File Count   Checked
jones:/home              10 GB      *11.3 GB (112%)       1,986    12h ago
lundun:/work             1 TB        20.9 MB (0%)       323,313   12h ago
gulf:/work/nrap12345     15 TB       12.2 TB (81%)    1,343,636   13h ago


Home Quota

Effects of Overage

Typical /home quota is 10 GB per user and is enforced as a hard limit. This means that when your usage exceeds the allowed space, you will not be able to write additional files to your home directory, and will receive write errors if you attempt to do so. No other restrictions are placed on accounts which have exceeded their quota on the /home filesystem, however as the default job submission will place log entries and some output from your job into your home directory, this data from your jobs may become unavailable if you were over your home quota when the job completes.

Fixing Overage

To correct overages in your home directory, it is necessary to log into a Sharcnet cluster, and remove files from your home directory, either through deletion, or moving the files to the /work or /freezer filesystems. Once your usage is below the limits, you will be able to make use of the directory again immediately, regardless of the output from the quota command, which will only update once per day.

To verify immediately the amount of space in your directory, you can use this command:

[sharcnet@bul129:~] du -sh /home/sharcnet
124M     /home/sharcnet

One thing to note - the du command can sometimes take some time to run on the filesystem, so you may need to be patient with it.

Work Quota

Effects of Overage

Global /work directories have a quota of 1TB, and cluster local /work filesystems have quotas of 200GB, enforced with a soft limit, meaning that while you can still write files to your directory while over your quota, other resource limitations are placed on your account when you are found to be over quota.

Temporarily exceeding your /work quota at the end of a job and removing the excess files immediately will have no effect on your account. Quota scans begin just after midnight, and as long as your usage is back below your quota by then, the system will not even notice that there was an issue.

If your /work usage is still over quota when the nightly filesystem scan runs, your account will be flagged as over quota, and placed into a 3-day grace period before resource limits are applied. This will cause your usage of that filesystem in the 'quota' command's output to be marked with a *, but otherwise will not place any other limits on your account, or your ability to submit and run jobs, to allow you time to clean up your excess usage without penalty.

If your /work usage is over quota for more than three days, your account will be flagged as over quota, and resource limitations will be applied which prevent you from submitting new jobs to the clusters, and will prevent any already queued jobs from running. You can verify if your account has had resource limits placed on it by using the groups command while logged into a cluster, and checking of the group 'ovrquota' is in the resulting list, like this:

[sharcnet@gb2:~] groups
sharcnet certlvl1 guelph_users fairshare-1 ovrquota

If you have jobs currently running or queued when your account is noted as being over quota, a warning email will be sent to your Sharcnet account informing you of the overage. Additionally, if your account is over quota at the end of the week (Sunday night), a warning email will be sent to your Sharcnet account even if no jobs are currently running.

Fixing Overage

If your account has had resource limitations placed on it due to being over quota for more than three days, there are two steps to correcting the problem:

First, you need to remove sufficient files from your directory to get your actual usage below your quota - this can be done by just deleting the files, copying them to your own local storage and deleting them, or by moving files to /freezer for long term storage. You can verify that your usage is below the limit by using the du command, like this:

[sharcnet@saw-login1:~] du -sh /work/sharcnet
843G     /work/sharcnet

As with the /home filesystem, the du command can sometimes take a considerable amount of time to run, especially if you have a large number of files and directories.

The second step in getting the over quota resource limitations removed from your account is to wait for the next quota scan to complete, at which time your overquota status will be cleared, and the 'ovrquota' group will be removed from your account.

If you have left any login sessions running from when you were over quota, you may need to log out of those sessions, and log back in in order to completely clear the ovrquota status. If this is the case, then the 'quota' command will report you as not being over on your usage, but the 'groups' command will still show you belonging to the 'ovrquota' group.

Scratch Expiration

The scratch filesystems are intended for short term storage of data files to provide local filesystem access for each cluster, which allows faster file access than the global /work filesystem. Because there is no space limitation, the scratch filesystems work on an age-based expiration system, where files that have not been altered for more than the expiration period are removed as "stale". The current expiration time for scratch filesystems is 62 days.

Because of the expiration, it is important to remember that you should not use /scratch for long term storage, or storage of important output files - any files produced by jobs that you need for long term should be moved out of /scratch into your /work or /freezer directory immediately after your job has completed.

Additionally, due to the lack of space usage limitations on the /scratch filesystem, it is possible for a few users to occupy the entire scratch filesystem, causing all of the users on that cluster to be unable to write to the drive. To check that space is available, you can use this command:

[sharcnet@orc-login1:~] df -h /scratch/
Filesystem            Size  Used Avail Use% Mounted on
10.27.9.132@o2ib:10.27.9.133@o2ib:/orcalfs
                      54T   47T  6.9T  88% /orc_lfs

In this case, the output indicates that the scratch filesystem on the Orca cluster has a total of 54TB, 47TB of which is currently in use, leaving only 6.9TB available for other users, with the filesystem being 88% full. A good practice would involve removing your old data files after your job has completed, to ensure that the filesystem does not become excessively filled with unwanted data.

Archive Storage

Note: recently the /archive filesystem has been discontinued and replaced by /freezer

Long term archival storage of files on Sharcnet clusters is provided through the /freezer filesystem. Please note: unlike our old /archive file system, the new /freezer file system has both a size quota (2TB; going over the quota results in your submitted jobs not running - same as with /work), and an expiry: after 2 years your files will be deleted. See our storage policies page for details. The /freezer filesystem is accessible only from the login nodes of the various clusters, and to use it, you simply move the files you wish to archive into your /freezer directory. As an example, the "sharcnet" user, wishing to move an entire directory called My_Results from their /work directory to /freezer, would do so by logging into the cluster, and using these commands:

[sharcnet@gup-hn:~] cd /work/sharcnet
[sharcnet@gup-hn:/work/sharcnet] mv My_Results /freezer/sharcnet/

Please note that if you are moving files into /freezer, you should either use the mv command to move the files, or delete the old files after copying them.

For moving substantial amounts of data the users should use the machine dtn.sharcnet.ca which is a dedicated data transfer node.

Additionally, for source code storage and backups, SHARCNET has set up a GIT repository, which has usage instructions here: https://www.sharcnet.ca/help/index.php/Git

Requesting an Exemption

Users who have need of a larger amount of storage space can submit a request for additional space to help@sharcnet.ca for an extension. The request should include:

  • Which filesystem the quota extension is requested to be on (global work, requin work, mako work, home directory, or scratch timeout period)
  • What the space will be used for
  • Why the space is needed, rather than simply placing most of the data in /freezer

Advice for best practices

Watch your quota

The first, and most important step to take in maintaining your disk usage on Sharcnet, is to keep a close eye on how much space you are using with the quota command. The easiest way to do this is to add it to the end of the .bashrc file in your home directory, so that your quota status will be displayed to you every time you log into a Sharcnet cluster or visualization machine.


Clean up unused files

Any data which you are not currently using for your work should be moved to /freezer storage - this can save you from headaches of running out of space in your work directory, and also keeps the scratch filesystems free of un-needed data, and available for everyone.

Any data you no longer need at all should be deleted.


Small numbers of large files is better than large numbers of small files

Large numbers of files in a single directory will cause file access to be slow for that directory. To prevent this, first you should try to make use of sub-directories to divide up your files into more manageable groups. Generally, you can place around 1000 files in a directory before access to the directory starts to get slowed by the number of files.

For archiving, large collections of small files should be archived together with the tar command, like this:

[sharcnet@mako2:/work/sharcnet] tar -c -v --remove-files -f /freezer/sharcnet/worlds-20090324.tar world*

This would collect all of the files with names starting with "world" into a single archive file named "/freezer/sharcnet/worlds-20090324.tar", indicating that they were "world" files from March 24th, 2009. The "/freezer/sharcnet/" part means that this file will be created in the /freezer filesystem, thus also removing it from occupying disk quota.

The parameters of the tar command have the following meanings:

"-c" means "Create", this is used to create a new archive.

"-v" means "Verbose", this causes tar to print out a list of all of the files it is adding to the archive, as it adds them.

"--remove-files" causes tar to delete the old versions of the files after they are successfully added to the archive file.

"-f worlds-20090324.tar" causes tar to output the results to the file worlds-20090324.tar. Without this, tar will simply display the contents of the file to the screen, instead of actually creating the archive file.

Optionally, the "-z" option can be added to compress your files as they are archived - if you do this, you should add .gz to the end of your archive file name.

To view a list of the files contained in an archive, you can use this command:

[sharcnet@mako2:/work/sharcnet] tar -t -v -f /freezer/sharcnet/worlds-20090324.tar
 -rw-r--r-- sharcnet/sharcnet 4529 2009-03-22 09:44:21 world-novirt0
 -rw-r--r-- sharcnet/sharcnet 4515 2009-03-22 09:43:02 world-novirt1
 -rw-r--r-- sharcnet/sharcnet 29850 2009-03-22 09:28:18 world-yesvirt0
 ...

Using the "-t" parameter instead of "-c" tells tar to list the contents of the file, rather than creating a new file. If you compressed your archive with "-z", you will also need to include this in the command to list it's contents.


Lastly, to extract the files into your current directory, you would use:

[sharcnet@mako2:/scratch/sharcnet] tar -x -v -m -f /freezer/sharcnet/worlds-20090324.tar

In this case, the two changed parameters are as follows:

"-x" causes tar to extract the archived files into whatever directory you are currently in. In the above example, we are extracting them into the /scratch/sharcnet directory on the mako cluster.

"-m" is important to use when extracting archived files to /scratch, as it resets the creation time on those files to the current time, so that they will not accidentally be expired early while we are still using them.

Again, if your archive file was compressed with the "-z" option, you will need to include it in the command to extract the files.