From Documentation
Jump to: navigation, search
(Loggin in to Systems, Transferring and Editing Files)
 
(503 intermediate revisions by 25 users not shown)
Line 1: Line 1:
 
[[Image:Sharcnet_logo.jpg|center]]
 
[[Image:Sharcnet_logo.jpg|center]]
<center><big>'''Frequently Asked Questions'''</big></center>
+
<center><big>'''Knowledge Base / Expanded FAQ'''</big></center>
 
+
<b> '''Note''': Most of the old FAQ entries have been removed as they are now covered at the Compute Canada level: https://docs.computecanada.ca/wiki/Frequently_Asked_Questions
 +
Some of the information on this page was moved to the [[Legacy Systems]] page.</b>
 
----
 
----
  
 
__TOC__
 
__TOC__
  
 +
<!--To make this long page more manageable, each major section has its content on its own separate page, which is then transcluded into this one.  Please edit the pages listed below to see the change reflected on this page-->
 +
{{:FAQ: About SHARCNET}}
 +
{{:FAQ: Getting an Account with SHARCNET and Related Issues}}
 +
{{:FAQ: Getting Help}}
 +
{{:FAQ: Research at SHARCNET}}
 +
{{:FAQ: Contacting SHARCNET}}
 +
{{:FAQ: How to Acknowledge SHARCNET in Publications}}
 +
{{:FAQ: What types of research programs / support are provided to the research community?}}
  
== About SHARCNET ==
+
<!--checked2015--><!-- This page is general information. -->
 
+
=== What is SHARCNET? ===
+
SHARCNET stands for Shared Hierarchical Academic Research Computing Network. Established in 2000, SHARCNET is the largest high performance computing consortium in Canada, involving eleven universities and colleges across southern Ontario.
+
 
+
SHARCNET also refers to a [http://www.sharcnet.ca/Facilities/network.php grid] of high performance clusters of thousands of processors on a dedicated, private high speed wide area network with a throughput of 1 Gigabits per second. Powered by the [http://www.orion.on.ca/ Ontario Research Innovation Optical Network] ([http://www.orion.on.ca/ ORION]) and the state-of-the-art operating system environments, the grid of SHARCNET enables researchers to run a single parallel application across multiple clusters deployed at different institutions seamlessly.
+
 
+
=== Where is SHARCNET? ===
+
The [http://www.sharcnet.ca/my/contact main office] of SHARCNET is located in the Western Science Centre at [http://www.uwo.ca/ The University of Western Ontario]. The SHARCNET high performance clusters are installed at each institution of the consortium and operated by SHARCNET staff across different sites.
+
 
+
=== What does SHARCNET have? ===
+
The infrastructure of SHARCNET consists of a group of 64-bit high performance Itanium2, Xeon and Opteron clusters along with a group of storage units deployed at a number of universities and colleges. Those high performance clusters are interconnected with each other through the [http://www.orion.on.ca/ Ontario Research Innovation Optical Network] ([http://www.orion.on.ca/ ORION]) with a private, dedicated connection currently running at 1 Gigabits per second. SHARCNET clusters run the Linux operating system.
+
 
+
=== What can I do with SHARCNET? ===
+
If you have a program that takes months to run on your PC, you could probably run it within a few hours using hundreds of processors on the SHARCNET grid, provided your program is inherently parallelisable. If you have hundreds or thousands of test cases to run through on your PC or computers in your lab, then with hundreds of processors running those cases independently will significantly reduce your test cycles .
+
 
+
If you have used beowulf clusters made of commodity PCs, you may notice a performance improvement on SHARCNET clusters which have high-speed Quadrics and Myrinet interconnect, as well as SHARCNET machines which have large amounts of memory. Also, SHARCNET clusters themselves are connected through a dedicated, private connection over the [http://www.orion.on.ca/ Ontario Research Innovation Optical Network] ([http://www.orion.on.ca/ ORION]).
+
 
+
If you have access to other super computing facilities at other places and you wish to share your ideas with us and SHARCNET users, please contact us. Together we can make SHARCNET better.
+
 
+
=== Who is running SHARCNET? ===
+
 
+
The daily operation and development of SHARCNET computational facilities is managed by a group of highly qualified system administrators. In addition, we have a team of high performance technical computing consultants, who are responsible for technical support on libraries, programming and application analysis.
+
 
+
=== How do I contact SHARCNET? ===
+
For technical inquiries, you may send E-mail to [mailto:help@sharcnet.ca help@sharcnet.ca], or contact your local system administrator or HPC specialist. For general inquiries, you may contact the SHARCNET [http://www.sharcnet.ca/my/contact main office].
+
 
+
=== My application runs on Windows, can I run it on SHARCNET? ===
+
It depends. If your application is written in a high level language such as C, C++ and Fortran and is system independent (meaning it does not depend on any particular third party libraries that are available only for Windows), then you should be able to recompile and run your application on SHARCNET systems. However, if your application completely depends upon a special software for Windows, then you are out of luck. In general it is impossible to convert code at binary level between Windows and any of UNIX platforms.
+
 
+
=== My application runs on Windows 2000 clusters, can I run it on SHARCNET clusters? ===
+
If your application does not use any Windows specific APIs then it should be able to recompile and run on SHARCNET UNIX/Linux based clusters.
+
 
+
 
+
== Getting an Account with SHARCNET and Related Issues ==
+
 
+
=== What is required to obtain a SHARCNET account ===
+
Anyone who would like to use SHARCNET may apply for an account, bearing in mind the following:
+
 
+
* There are no shared/group accounts: each person who uses SHARCNET requires their own account.
+
* Primary accounts are available to faculty at Canadian institutions; non-Canadian faculty may be approved by the Scientific Director. The primary account holder is responsible for providing SHARCNET with citations for publications aided by SHARCNET resources.
+
* Students, postdocs and research fellows must be "sponsored" by a faculty-level account-holder, and require the SHARCNET username of their sponsor when applying for an account.
+
* All applicants must have an institutional email account that we can use for contact purposes; commercial email accounts such as Yahoo or Hotmail are not acceptable.
+
* All SHARCNET users must read and follow the policies listed [http://www.sharcnet.ca/my/systems/policies here].
+
 
+
=== How do I apply for an account? ===
+
Click [http://www.sharcnet.ca/my/security/new_account here] to register for a SHARCNET account.
+
 
+
Your application will be processed within 24 hours and you will receive an E-mail notification of whether your application is approved or declined.
+
 
+
Please do not send E-mail or call to request for an account, as the system has no way to process your request.
+
 
+
=== Can I just have a cluster account without having a web portal account? ===
+
No. The web portal account is a web interface to your account in our user database. It also provides you a way of managing your information and keeping track of problems you have, which will be useful for troubleshooting if you encounter the same type of problem.
+
 
+
=== Can I E-mail or call to open an account? ===
+
No. The system has no way to process your request. You must apply through the [http://www.sharcnet.ca/my/security/login web portal].
+
 
+
=== OK, I've seen and heard the word "web portal" enough, what is it anyway? ===
+
A web portal is a web site that offers online services. Usually a web portal has a database at the backend, in which people can store and access personal information. At SHARCNET, registered users can login to the web portal, manage their profiles, apply for computer accounts at specific sites and submit and review programming and performance related problems.
+
 
+
=== I have opened an account at the web portal, do I need to apply for a cluster account? ===
+
For people who acquired an account after January 2004, an account on various SHARCNET systems are created automatically. For those who had an account prior to January 2004, applying for accounts on specific systems may be necessary.
+
 
+
=== In the account application form, what should I fill in the "sponsor" field? ===
+
If you are a faculty member, then leave the field "sponsor" blank. Otherwise, the sponsor field is the SHARCNET username of your supervisor or collaborator.
+
 
+
=== My supervisor does not have an account, so my application can't go through, what should I do? ===
+
If your supervisor does not have an account yet, please ask your supervisor to apply for an account first.
+
 
+
=== My supervisor forgot all about his/her username, so my application can't go through, what should I do? ===
+
Send an E-mail to [mailto:help@sharcnet.ca help@sharcnet.ca]. If your supervisor does have an account, we will find it out for you.
+
 
+
=== My supervisor does not use SHARCNET, why is my supervisor asked to have an account anyway? ===
+
Your supervisor's account ID is used to identify which group your account belongs to. This is the way the user account management system is designed, therefore is mandatory.
+
 
+
=== I am a faculty member. In the application for an account, what should I fill in the field "sponsor"? ===
+
If you are a faculty member with one of the SHARCNET institutions, you should leave the field "sponsor" blank.
+
 
+
=== I am a visiting scholar, in the application for an account, what should I fill in the field "sponsor" ? ===
+
You should fill in the user name of the person who invites you.
+
 
+
=== I am changing supervisor or I am becoming faculty, and I already have a SHARCNET account. Should I apply for a new account? ===
+
No. Send all of the details to [mailto:help@sharcnet.ca help@sharcnet.ca], and we will update your account.
+
 
+
=== Is there any charge for using SHARCNET? ===
+
No, not at this time.
+
 
+
=== I have an account at a SHARCNET site, do I automatically have an account at rest of the sites? ===
+
For those who acquired account before January 1, 2004, you must apply for a cluster account on a specific cluster. Those who acquired account after January 1, 2004, will have accounts created at all sites automatically.
+
 
+
=== I forgot my password ===
+
If you forgot your password you can always get it reset by clicking the "Forget password" button on the login page, or click [http://www.sharcnet.ca/my/security/forgot here].
+
 
+
=== I forgot my username ===
+
If you forget your username, please send an E-mail to [mailto:help@sharcnet.ca help@sharcnet.ca]. Your username for the web portal and cluster account are the same.
+
 
+
=== My account has been disabled (so i cannot login). What should I do ? ===
+
Typically your account expiry date was not renewed by your sponsor before the "Account expiration" date as shown on your [https://www.sharcnet.ca/my/profile/profile profile]. To fix this, ask your sponsor to visit [https://www.sharcnet.ca/my/profile/sponsored sponsored] and click enable to renew your account. Note that when your acccount is disabled you will not be able to log into any SHARCNET cluster or the SHARCNET web portal!
+
 
+
 
+
== Loggin in to Systems, Transferring and Editing Files ==
+
 
+
=== How do I login to SHARCNET? ===
+
There is no single point of entry at present. "Logging in to SHARCNET" means you login to one of the SHARCNET systems. A complete list of SHARCNET systems can be found on [https://www.sharcnet.ca/my/systems our facilities page].
+
 
+
To login to a system, you need to use [http://www.openssh.com/ Secure Shell (SSH)] connection. If you are logging from a UNIX machine, make sure it has SSH client (ssh) installed. If you have the same login name on both your local system and SHARCNET, and you want to login to, say, <tt>bull</tt>, you may use the command
+
 
+
ssh bull.sharcnet.ca
+
 
+
If your SHARCNET username is different from the username on your local systems, then you may use either of the following commands
+
 
+
ssh bull.sharcnet.ca -l username
+
username@bull.sharcnet.ca
+
 
+
If you want to eastablish an X window connection so that you can use graphics applications such as <tt>gvim</tt> and <tt>xemacs</tt>, you can add an option <tt>-Y</tt> and the end of the command, e.g.
+
 
+
ssh bull.sharcnet.ca -l username -Y
+
 
+
There is no need to set X display on the host you login to.
+
 
+
If you are logging from a Windows or Mac machine, you need to have [[ssh for Windows Users|SSH client for Windows]] or Mac installed.
+
 
+
=== How can I access to SHARCNET machines from Windows PC? ===
+
 
+
There are basically two options. One is using the software Secure Shell (SSH) File Transfer Client, available at www.ssh.com. It allows you transfer files between your desktop and remote host by dragging them from one side to the other. There are also text based programs (e.g. putty, psfp, pscp)
+
 
+
Another tool you can use, which is recommended, is Cygwin/X (www.cygwin.com). It provides an all-in-one, UNIX like environment. It comes with SSH support as a package. The installation is straightforward. During the installation, you will need to select the *Net* package. By default, SSH is NOT installed. If you are familiar with UNIX, then you can work within Cygwin/X using UNIX commands as if you were on a Linux box.
+
 
+
 
+
=== What operating systems are supported? ===
+
UNIX in general. Currently, Linux is the only operating system used within SHARCNET.
+
 
+
=== What makes a cluster different than my UNIX workstation? ===
+
 
+
If you are familiar with UNIX, then using a cluster is not much different from using a workstation. When you login to a cluster, you in fact only log in to one of the cluster nodes. In most cases, each cluster node is a physical machine, usually a server class machine, with one or several CPUs, that is more or less the same as a workstation you are familiar with. The difference is that these nodes are interconnected with special interconnect devices and the way you run your program is slightly different. Across SHARCNET clusters, you are not expected to run your program interactively. You will have to run your program through a queueing system. That also means where and when your program gets to run is not decided by you, but by the queueing system.
+
 
+
=== Which cluster should I use? ===
+
Each of our clusters is designed for a particular type of job. Our [https://www.sharcnet.ca/my/systems/clustermap cluster map] shows which systems are suitable for various job types.
+
 
+
=== What programming languages are supported? ===
+
Those primary programming languages such as C, C++ and Fortran are supported. Other languages, such as Java, Pascal and Ada, are also supported, but with limited technical support from us. That means, if your program is written in any language other than C, C++ and Fortran, and you encounter a problem, we may or may not be able solve it within a short period of time.
+
 
+
=== How do I organize my files? ===
+
Our experience is that when large amounts of storage are available, it is too easy to lose track of files, let stale copies accumulate, etc. The number of files that one can truly manage is also fairly modest and does not scale over time, or with availability of storage. For these reasons, SHARCNET provides the following pools of storage:
+
 
+
{| class="wikitable" style="text-align:left" border="1"
+
! place        !! quota    !! expiry  !! access          !! purpose
+
|-
+
| /home    ||  200 MB  ||  none    ||  unified          ||  sources, small config files
+
|-
+
| /work      ||  200 GB  ||  none    || per-cluster    ||  active data files
+
|-
+
| /scratch  ||  none      || 36 days ||  per-cluster  ||  temporary files, checkpoints
+
|-
+
| /tmp      ||  none    ||  2 days  ||  per-node    ||  node-local scratch
+
|-
+
| archive || none || none || unified command-access || long term data archive
+
|}
+
 
+
These distinctions reflect the fact that different kinds of files have very different properties, so are best implemented using different file systems, servers, RAID levels and backup policies.
+
 
+
Backups are in place for your home directory ''only''. Scratch and work are ''not'' backed up. In general we store one version of each file for the previous 5 working days, one for each of the 4 previous weeks, and one version per month before that. Backups began in September 2006.
+
 
+
The access column represents our design for the new SHARCNET environment: it is not implemented on all clusters yet. /home is shown as unified - this means that when you login, regardless of cluster, you always see the same directory. since /home is remote on most clusters, it's important that you not have lots of jobs doing IO to it. That's what /work is for, and is why most clusters have their own /work directory.
+
 
+
/scratch has no quota limit - so you can put as much data in /scratch/<userid> as you want,
+
until there is no more space. The important thing to note though, is that all files on /scratch
+
that are over 36 days old will be automatically deleted.
+
 
+
Once a file is created in /scratch/<userid> reading it, renaming, changing the file's timestamps with 'touch', or copying it into another file are all irrelevant. The file will be expired 36 days after it was created.
+
 
+
Only files that have been modified (e.g. more information written to the file) will be safe from deletion.
+
 
+
If you'd like to reliably backup large volumes of data to archive storage, use the ''archive'' command rather than leaving it on the per-cluster /work filesystems.  There have been instances where data was lost on /work and /scratch, so it is definitely a good idea to back up your data to archive if it is important.  ''archive'' is provided by the "Archive Tools" with command-only access (ie. it is not possible for users to directly manipulate the filesystem). See the following FAQ entry for further details.
+
 
+
=== How do I transfer my files to and from a cluster? ===
+
To transfer files to and from a cluster on a UNIX machine, you may use <tt>scp</tt> or <tt>sftp</tt>. For example, if you want to upload file <tt>foo.f</tt> to cluster narwhal from your machine <tt>myhost</tt>, use the following command
+
 
+
myhost$ scp foo.f narwhal.sharcnet.ca:
+
 
+
assuming that your machine has <tt>scp</tt> installed. If you want to transfer a file from Windows or Mac, you need have <tt>scp</tt> or <tt>sftp</tt> for Windows or Mac installed.
+
 
+
If you transfer file <tt>foo.f</tt> between SHARCNET clusters, say from your home directory on narwhal to your scratch directory on requin, simply use the following command
+
 
+
[username@nar316 ~]$ scp foo.f requin:/scratch/username/
+
 
+
=== How do I transfer an entire directory to and from a cluster? ===
+
 
+
If you are transferring files between a UNIX machine and a cluster, you may use <tt>scp</tt> command with <tt>-r</tt> option. For instance, if you want to download the subdirectory <tt>foo</tt> in the directory <tt>project</tt> in your home directory on whale to your local UNIX machine, on your local machine, use command
+
 
+
myhost$ scp -rp whale.sharcnet.ca:project/foo .
+
 
+
Similarly, you can transfer the subdirectory between SHARCNET clusters. The following command
+
 
+
[username@nar316 ~]$ scp -rp requin:/scratch/username/foo .
+
 
+
will download subdirectory <tt>foo</tt> from your scratch directory on requin to your home directory on narwhal (note that the prompt indicates you are currently logged on to narwhal).
+
 
+
The use of <tt>-p</tt> option above will preserve the time stamp of each file. For Windows and Mac, you need to check the documentation of <tt>scp</tt> for features.
+
 
+
You may also <tt>tar</tt> and compress the entire directory and then use <tt>scp</tt> to same bandwidth. In the above example, first you login to narwhal, then do the following
+
 
+
[username@nar316 ~]$ cd project
+
[username@nar316 ~]$ tar -cvf foo.tar foo
+
[username@nar316 ~]$ gzip foo.tar
+
 
+
Then on your local machine myhost, use <tt>scp</tt> to copy the tar file
+
 
+
myhost$ scp narwhal.sharcnet.ca:project/foo.tar.gz .
+
 
+
Note for most Linux distributions, <tt>tar</tt> has an option <tt>-z</tt> that will compress the <tt>.tar</tt> file using <tt>gzip</tt>.
+
 
+
=== How to archive my data? ===
+
 
+
SHARCNET provides a small set of scripts (hereafter "Archive Tools") allowing users to move their data between clusters and our Archive (a long-term storage facility with 200 TB capacity). Please note that Archive is not directly mounted on any of our clusters, and can be accessed only through Archive Tools.
+
 
+
The only Archive Tools script directly used by users is <tt>archive</tt>. Executing this script without any parameters on any of our clusters will print a short description of the program:
+
 
+
[xxx@wha783 ~]$ archive
+
Usage:
+
    archive [--get name-you-chose] [--put name-you-choose
+
    list-of-local-files-and-dirs] [--remove name-you-choose] [--list
+
    [directory/file-name]]
+
+
+
Options:
+
    --put          store into an archive of the given name
+
                    (the name can include subdirectories).
+
    --get          retrieve files of the given name
+
                    (also works with subdirectories).
+
    --list        show your archives or directories (or specific one).
+
    --remove      remove an archive or an empty directory.
+
+
    -h or --help        show usage (default option).
+
    --man              show man page.
+
 
+
The Tools allow basic functionality, such as an ability to move a collection of local files and/or directories to Archive (option <tt>--put</tt>), to move the data back (<tt>--get</tt>), to list the user's archives (<tt>--list</tt>), and to delete an archive (<tt>--remove</tt>).
+
 
+
An example: you want to archive the contents of two directories (with all the subdirectories), <tt>DIR1</tt> and <tt>DIR2</tt>, and two files, <tt>file1</tt> and <tt>file2</tt>. First, come up with a good (descriptive) name for your archive - say, <tt>DIR1-2_file1-2 </tt>. Then, execute the following command:
+
 
+
archive --put DIR1-2.file1-2  DIR1 DIR2 file1 file2
+
 
+
It will create an (uncompressed) TAR file <tt>DIR1-2.file1-2</tt> in Archive containing the data from <tt>DIR1</tt>, <tt>DIR2</tt>, <tt>file1</tt>, and <tt>file2</tt>. To better organize their archives, users can create directories and subdirectories in Archive during the execution of <tt>archive --put</tt>. For example,
+
 
+
archive --put Level1/Level2/name file1 file2
+
 
+
will first create nested directories <tt>Level1/Level2</tt> in Archive, and then will copy <tt>file1</tt> and <tt>file2</tt> to the archive <tt>Level1/Level2/name</tt>. If the archive with such a name already exists in Archive, the command will fail.
+
 
+
'''TIP''': It is strongly recommended to create a list of the files being archived during the execution of <tt>archive --put ...</tt>, by piping the standard output to a local file:
+
 
+
archive --put Level1/Level2/name file1 file2 >& name.list
+
 
+
This is especially important if you create large (many gigabytes) archives containing many files. Keep these list files in one location. This will significantly simplify the task of locating one particular file or directory in your archives.
+
 
+
The opposite command is <tt>--get</tt>. For the above examples, the files will be copied back from Archive after executing the following commands:
+
 
+
archive --get DIR1-2_file1-2
+
archive --get Level1/Level2/name
+
 
+
If the local directories or files with the same names already exist, they will not be overwritten, and the program will produce error messages.
+
 
+
Finally, individual files or empty directories can be removed from Archive by executing <tt>archive --remove</tt>. Each deletion has to be explicitly confirmed (by typing "yes"). For example, to delete <tt>Level1/Level2/name</tt> from Archive, one has to execute the following sequence of commands, confirming each deletion with "yes":
+
 
+
archive --remove Level1/Level2/name
+
archive --remove Level1/Level2
+
archive --remove Level1
+
 
+
=== SSH tells me SOMEONE IS DOING SOMETHING NASTY!? ===
+
 
+
Suppose you attempt to login to SHARCNET, but instead get an alarming message like this:
+
 
+
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!    @
+
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
+
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
+
It is also possible that the RSA host key has just been changed.
+
The fingerprint for the RSA key sent by the remote host is
+
fe:65:ab:89:9a:23:34:5a:50:1e:05:d6:bf:ec:da:67.
+
Please contact your system administrator.
+
Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
+
Offending key in /home/hahn/.ssh/known_hosts:42
+
RSA host key for requin has changed and you have requested strict checking.
+
Host key verification failed.
+
 
+
SSH normally tries to verify that the host you're connecting to is authentic. It does this by caching the hosts's "hostkey" in your  <tt>~/.ssh/known_hosts</tt> file. At times, it may be necessary to legitimately change its hostkey; when this happens, you may see such a message. It's a good idea to verify this with us, but the normal answer is to simply remove the old hostkey from your <tt>known_hosts</tt> file. You can use your choice of editor if you're comfortable doing so (it's a plain text file, but has long lines). On a unix-compatible machine, you can also use the following very small script:
+
 
+
perl -pi -e 'undef $_ if (++$line == 42)' ~/.ssh/known_hosts
+
 
+
(Substitute the line(s) printed in the warning message for '42' here.)
+
 
+
Another solution is brute-force: remove the whole <tt>known_hosts</tt> file. This throws away any authentication checking, and your first subsequent connection to any machine will prompt you to accept a newly discovered host key. If you find this prompt annoying, you can avoid it by adding a text file named <tt>~/.ssh/config</tt> on your machine with the following content:
+
 
+
StrictHostKeyChecking no
+
 
+
=== Ssh works, but scp doesn't! ===
+
 
+
If you can <tt>ssh</tt> to a cluster successfully, but cannot <tt>scp</tt> to to it, the problem is likely that your login scripts print unexpected messages which confuse <tt>scp</tt>. <tt>scp</tt> is based on the same <tt>ssh</tt> protocol, but assumes that the connection is "clean": that is, that it does not produce any un-asked-for content. If you have something like:
+
 
+
echo "Hello, Master; I await your command..."
+
 
+
<tt>scp</tt> will be confused by the salutation. To avoid this, simply ensure that the message is only printed on an interactive login:
+
 
+
if [ -t 0 ]; then
+
    echo "Hello, Master; I await your command..."
+
fi
+
 
+
or in csh/tcsh syntax:
+
 
+
if ( -t 0 ) then
+
    echo "Hello, Master; I await your command..."
+
endif
+
 
+
=== Running commands remotely ===
+
 
+
It is also possible to specify a command to run on the end of a ssh command.  A command like <tt>ssh narwhal.sharcnet.ca sqjobs</tt>, however, will not work because ssh does not setup a full environment by default.  In order to get the same environment you get as when you login, it is necessary to run the command under bash in login mode.
+
 
+
myhost$ ssh narwhal.sharcnet.ca bash -l -c sqjobs
+
 
+
If you wish to specify a command longer than a single word, it is necessary to quote it as the bash <tt>-c</tt> only takes a single argument.  In order to pass these quotes through to ssh, however, it is necessary to escape them.  Otherwise the local shell will interpret them and strip them off.  An example is
+
 
+
myhost$ ssh narwhal.sharcnet.ca bash -l -c \' sqsub -r 5h ./myjob \'
+
 
+
Most problems with these commands are related to the local shell interpreting things that you wish to pass through to the remote side (e.g., stripping out any unescaped quotes).  Use <tt>-v</tt> with ssh and <tt>set -x</tt> with bash to see what command(s) ssh and bash are executing respectively..
+
 
+
myhost$ ssh -v narwhal.sharcnet.ca bash -l -c \' sqsub -r 5h ./myjob \'
+
myhost$ ssh narwhal.sharcnet.ca bash -l -c \' set -x\; sqsub -r 5h ./myjob \'
+
 
+
=== How do I edit my program on a cluster? ===
+
 
+
We provide a variety of editors, such as the traditional text-mode <tt>emacs</tt> and <tt>vi</tt> (vim), as well as a simpler one called <tt>nano</tt>. If you have X on your desktop (and properly tunneled through SSH), you can use the GUI versions (<tt>xemacs</tt>, <tt>gvim</tt>).
+
 
+
== Compiling and Running Programs ==
+
 
+
=== How do I compile my programs ===
+
To make it easier to compile across all SHARCNET clusters, we provide a generic set of commands:
+
 
+
cc, c++, f77, f90, f95
+
 
+
and for MPI,
+
 
+
mpicc, mpic++, mpiCC, mpif77, mpif90, mpif95
+
 
+
These commands provide several benefits:
+
* they select optimization appropriate to the cluster's CPUs.
+
* using <tt>-lmpi</tt> or the mpi-prefixed commands will select the necessary cluster-specific options for MPI.
+
* using <tt>-llapack</tt> link with the vendor-tuned LAPACK library.
+
* using <tt>-openmp</tt> will direct the compiler to use OpenMP.
+
 
+
Here are some basic examples:
+
 
+
cc foo.c -o foo
+
cc -openmp foo.c -llapack -o foo
+
f90 *.f90 -lmpi -o my_mpi_prog
+
mpif90 *.f90 -o my_mpi_prog
+
f90 -mpi -c a.f90; mpif90 -c b.f90; compile a.o b.o -lmpi -o my_mpi_prog
+
 
+
In the first example, the preferred compiler and optimization flags will be selected, but not much else happens. In the second case, the underlying compiler's OpenMP flag (which differs among compilers) is selected, as well as linking with a system-tuned LAPACK/Blas library. in the third example, an MPI program written in Fortran90 is compiled and linked with whatever cluster-specific MPI libraries are required. The fourth example is identical except that the mpi-prefixed command is used. In the fifth example, two files are separately compiled, then linked with MPI stuff; the point is simply that even for non-linking, you need to declare that you're using MPI by either an mpi-prefixed command or <tt>-mpi</tt> or <tt>-lmpi</tt>.
+
 
+
These command commands will invoke the underlying compilers such as [http://software.intel.com/en-us/intel-compilers/ Intel] or [http://www.pathscale.com/ PathScale] compilers, whichever are available to the system you are using. For specific compiler options, please refer to the man pages.
+
 
+
You aren't required to use these commands, and may not want to if you have pre-existing Makefiles, for instance. You can always add <tt>-v</tt> to see what full commands are being generated. Here's a brief summary of compilers available on various systems:
+
 
+
{| class="wikitable" style="text-align:left" border="1"
+
! system          !! compilers
+
|-
+
| Opteron systems (requin, narwhal, whale, bull) || pathcc pgcc gcc
+
|-
+
| Xeon systems (saw, mako)                      || icc ifc ifort gcc
+
|-
+
| Itanium2 systems (silky)                      || icc ifc ifort
+
|}
+
 
+
On Intel Itanium 2 clusters in particular, you should always use the high performance Intel compilers <tt>icc</tt> and <tt>ifort</tt> for C/C++ and Fortran code respectively, if available. They give much better performance than the generic GNU compilers on this chip.
+
 
+
=== How do I run a program? ===
+
 
+
In general, users are expected to run their jobs in "batch mode".  That is, one submits a job -- the application problem -- to a queue through a batch queue command, the scheduler schedules the job to run at a later time and sends the results back once the program is finished.
+
 
+
In particular, one will use SQ command (see [[FAQ#What_is_the_batch_job_scheduling environment_SQ.3F|What is the batch job scheduling environment SQ?]] below) to launch a serial job foo
+
 
+
sqsub -o foo.log -r 5h ./foo
+
 
+
This means to submit the command <tt>foo</tt> as a job with a 5 hour runtime limit and put its standard output into a file <tt>foo.log</tt> (note that it is important to not put too tight of a runtime limit on your job as it may sometimes run slower than expected due to interference from other jobs).
+
 
+
If your program takes command line arguments, place the arguments after your program name just as when you run the program interactively
+
 
+
sqsub -o foo.log -r 5h ./foo arg1 arg2...
+
 
+
For example, suppose your program takes command line options <tt>-i input</tt> and <tt>-o output</tt> for input and output files respectively, they will be treated as the arguments of your program, not the options of <tt>sqsub</tt>
+
 
+
sqsub -o foo.log -i input.dat -o output.dat -r 5h ./foo
+
 
+
To launch a parallel job <tt>foo_p</tt>
+
 
+
sqsub -q mpi -n num_cpus -o foo_p.log -r 5h ./foo_p
+
 
+
The basic queues on SHARCNET are:
+
 
+
{| class="wikitable" style="text-align:left" border="1"
+
! queue    !! usage
+
|-
+
| serial  || for serial jobs
+
|-
+
| mpi      || for parallel jobs using the MPI library
+
|-
+
| threaded || for threaded jobs using OpenMP or POSIX threads
+
|}
+
 
+
To see the status of submitted jobs, use command <tt>sqjobs</tt>.
+
 
+
=== What is the batch job scheduling environment SQ? ===
+
 
+
SQ is a unified frontend for running jobs on SHARCNET, intended to hide unnecessary differences in how the clusters are configured.  On clusters which are based on RMS, LSF+RMS, or Torque+Maui, SQ is just a thin shell of scripting over the native commands.  On Wobbie, the native queuing system is called SQ.
+
 
+
To run a job, you use <tt>sqrun</tt>:
+
 
+
sqrun -n 16 -q mpi -r 5h ./foo
+
 
+
This runs <tt>foo</tt> as an MPI command on 16 processors with a 5 hour runtime limit (make sure to be somewhat conservative with the runtime limit as a job may run for longer than expected due to interference from other jobs).  You can control input, output and error output using these flags:
+
 
+
sqrun -o outfile -i infile -e errfile -r 5h ./foo
+
 
+
this will run <tt>foo</tt> with its input coming from a file named <tt>infile</tt>, its standard output going to a file named <tt>outfile</tt>, and its error output going to a file named <tt>errfile</tt>.  Note that using these flags is preferred over shell redirection, since the flags permit your program to do IO directly to the file, rather than having the IO transported over sockets, then to a file.
+
 
+
Often, especially with IO redirection as above, it is convenient to submit a job, and not wait for it to run.  To do this, simply add a <tt>--bg</tt> switch to <tt>sqrun</tt>, or equivalently use <tt>sqsub</tt>.  It makes no difference to the scheduler whether you run (wait to complete) or submit (batch mode).
+
 
+
For threaded applications (which use Pthreads, OpenMP, or fork-based parallelism), do this:
+
 
+
sqsub -q threaded -n 2 -r 5h ./foo
+
 
+
Serial jobs require no flags beyond the runtime
+
 
+
sqrun -r 5h ./foo
+
 
+
but you can provide IO redirection flags if you wish.
+
 
+
=== How do I show and control jobs under SQ? ===
+
To show your jobs, use "sqjobs". by default, it will show you only your own jobs. with "-a" or "-u all", it will show all users. similarly, "-u someuser" will show jobs only for this particular user.
+
 
+
To kill, suspend or resume your jobs, use sqkill/suspend/resume with the job ID as shown by sqjobs.
+
 
+
Note also that providing the -v switch to sqrun/sqsub will print the jobid at submission time.
+
 
+
=== How do I translate my LSF command to SQ? ===
+
SQ very strongly resembles LSF commands such as bsub. For instance, here are two versions, the first assuming LSF, the second using SQ:
+
 
+
bsub -q mpi -n 16 -o term.out ./ParTISUN
+
sqsub -q mpi -n 16 -o term.out ./ParTISUN
+
 
+
There are some differences:
+
* SQ doesn't have static queues like LSF. Instead, the "-q" simply describes the kind of job - MPI(parallel), threaded or serial.
+
* SQ doesn't use the extra "prun" in there - it knows that parallel jobs always need the prun.
+
* sqjobs is quite similar to bjobs.
+
* sqkill/suspend/resume is quite similar to bkill/suspend/resume.
+
 
+
=== How can I submit jobs that will run where ever there are free cpus? ===
+
 
+
SHARCNET clusters differ in several ways: access to particular storage and cluster node properties. For instance, if you submit a job which refers to files in /work or /scratch, it may currently only run on that particular cluster. Similarly, a job may require, for instance, a very large amount of memory per processor, only available on Bull. But some jobs which do little IO, and which are serial and use modest amounts of memory may be run using the "global jobs" facility.
+
 
+
To submit a global job, just add <tt>--global</tt> to the sqsub command:
+
 
+
sqsub --global -o my.log ./program
+
sqjobs --global
+
 
+
Again, this currently only applies to jobs which can run in your /home tree (which is very limited in size and speed), and which are serial.
+
 
+
=== Can I use a script to compile and run programs? ===
+
Yes. For instance, suppose you have a number of source files <tt>main.f, sub1.f, sub2.f, ..., subN.f</tt>, to compile these source code to generate an executable myprog, it's likely that you will type the following command
+
 
+
f77 -o myprog main.f sub1.f sub2.f ... sub N.f -llapack
+
 
+
Here, the <tt>-o</tt> option specifies the executable name myprog rather than the default <tt>a.out</tt> and the option <tt>-llapack</tt> at the end tells the compiler to link your program against the LAPACK library, if LAPACK routines are called in your program. If you have long list of files, typing the above command every time can be really annoying. You can instead put the command in a file, say, mycomp, then make mycomp executable by typing the following command
+
 
+
chmod +x mycomp
+
 
+
Then you can just type
+
 
+
./mycomp
+
 
+
at the command line to compile your program.
+
 
+
This is the simplest way to deal with multiple source files. However, this is not the efficient way. The most efficient way to compile multiple files and use different libraries is to use [http://mrbook.org/blog/tutorials/make/ make].
+
 
+
=== I have a program that runs on my workstation, how can I have it run in parallel? ===
+
If the the program was written without parallelism in mind, then there is very little that you can do to run it automatically in parallel. Some compilers are able to translate some serial portion of a program , such as loops, into equivalent parallel code, which allows you to explore the potential architecture found mostly in symmetric multiprocessing (SMP) systems. Also, some libraries are able to use parallelism internally, without any change in the user's program. For this to work, your program needs to spend most of its time in the library, of course - the parallel library doesn't speed up your program itself. Examples of this include threaded linear algebra and FFT libraries.
+
 
+
However, to gain the true parallelism and scalability, you will need to either rewrite the code using the message passing interface (MPI) library or annotate your program using OpenMP directives. We will be happy to help you parallelize your code if you wish. (Note that OpenMP is inherently limited by the size of a single node or SMP machine - most SHARCNET resources
+
 
+
Also, the preceding answer pertains only to the idea of running a ''single'' program faster using parallelism. Often, you might want to run many different configurations of your program, differing only in a set of input parameters. This is common when doing Monte Carlo simulation, for instance. It's usually best to start out doing this as a series of independent serial jobs. It ''is'' possible to implement this kind of loosely-coupled parallelism using MPI, but often less efficient and more difficult.
+
 
+
=== How can I have a quick test run of my program? ===
+
Sometimes you may experience long waiting time before your program in the queue starts running. To allow users to test their programs, a "test queue" is provided, which enables users to launch their programs quickly.
+
 
+
To have a test run, use sqsub option <tt>--test</tt>. For example, if you have an MPI program mytest that uses 8 processors, you may use the following command
+
 
+
sqsub --test -q mpi -n 8 -o mytest.log ./mytest
+
 
+
The only difference here is the "<tt>--test</tt>". The scheduler will normally start such test jobs within a few seconds.
+
 
+
The main purpose of the test queue is quickly test the startup of a changed job - just to verify that for a real, production run, it won't hit a bug shortly after starting (for instance, due to missing parameters.)
+
 
+
The "test queue" only allows one to run test program for a very short period of time, therefore you must make sure that your test run will not take hours to finish. In addition, the system monitors the user submissions and decreases the priority of submitted jobs over time within an internally defined time window. Hence if you keep submitting jobs as test runs, the waiting time before those jobs get started will be getting longer, or you will not be able to submit test jobs any more.
+
 
+
=== Which system should I choose? ===
+
There are many clusters, many of them specialized in some way.  We provide an [http://www.sharcnet.ca/my/systems/clustermap interactive map] of SHARCNET systems on the web portal which visually presents a variety of criteria as a decision making aid.  In brief however, depending on the nature of your jobs, there may be a clear preference for which cluster is most appropriate:
+
;is your job serial?
+
:Whale is probably the right choice, since it has a very large number of processors, and consequently has high throughput. Your job will probably run soonest if you submit it here.
+
;do you use a lot of memory?
+
:Bull or Hound is probably the right choice.
+
;does your MPI program utilize a lot of communication?
+
:Requin and Bull have the fastest networks, but it's worth trying Narwhal or Saw if you aren't familiar with the specific differences between Quadrics, Myrinet and Infiniband.
+
;does your job (or set of jobs) do a lot of disk IO?
+
:you probably want to stick to one of the major clusters (Bull/Narwhal/Requin/Saw/Whale) which have bigger and much faster (parallel) filesystems.
+
 
+
=== Where can I find available resources? ===
+
The information about available computational resources are available to the public on SHARCNET web at: our [https://www.sharcnet.ca/my/systems systems page] and our [https://www.sharcnet.ca/my/perf_cluster/cur_perf cluster performance page].
+
 
+
The change of status of each system, such as down time, power outage, etc is announced through the following three different channels:
+
 
+
* '''Web links under [https://www.sharcnet.ca/my/perf_cluster/cur_perf systems]'''. You need to check the web site from time to time in order to catch such public announcements.
+
 
+
* '''System notice mailing list'''. This is the passive way of being informed. You receive the notices in e-mail as soon as they are announced. But some people might feel it is annoying to be informed. Also, such notices may be buried in dozens or hundreds of other e-mail messages in your mail box, hence are easily ignored.
+
 
+
* '''SHARCNET [https://www.sharcnet.ca/Media/rss/ RSS broadcasting]'''. A good analogy of RSS is like traffic information on the radio. When you are on a road trip and you want to know what the traffic conditions are ahead, you turn on the car radio, tune-in to a traffic news station and listen to updates periodically. Similarly, if you want to know the status of SHARCNET systems or the latest SHARCNET news, events and workshops, you can turn to RSS feeds on your desktop computer.
+
 
+
The term RSS may stand for Really Simple Syndication, RDF Site Summary, or Rich Site Summary depending on the version. Written in the format of XML, RSS feeds are used by websites to syndicate their content. RSS feeds allow you to read through the news you want, at your own convenience. The messages will show up on your desktop, e.g. using Mozilla [https://www.sharcnet.ca/Media/rss/rss_example_thunderbird.php Thunderbird], an integrated mail client software, as soon as there is an update.
+
 
+
=== Can I find my job submission history? ===
+
Yes. Your every single job submission is recorded in a database. Each record contains the command, the submission time, the start time, the completion time, exit status of your program (i.e. succeeded or failed), number of CPUs used, system, and so on.
+
 
+
You may review the history by logging in to your web account.
+
 
+
=== How are jobs scheduled? ===
+
Job scheduling is the mechanism which selects waiting jobs ("queued") to be started ("dispatched") on nodes in the cluster. In all cases, production SHARCNET clusters are "exclusively" scheduled, so that a job will have complete access to the CPUs its currently running on. Jobs may, however, sometimes be preempted (put into a suspended state) if some other, higher-priority job must be started. Normally, preemption happens only for "test" jobs, which are fairly short (always less than 1 hour). After being preempted, a job is resumed (and the intervening period is not counted as usage.)
+
 
+
In practice, if there are enough free processors to run your job, and no else has any jobs queued, then you should expect your jobs to start immediately. Once there are more jobs queued than available resources, the scheduler will attempt to arbitrate between the CPU demands of all queued jobs. This arbitration happens in the following order: Dedicated Resource jobs first, then "test" jobs (which may also preempt normal jobs), and finally normal jobs. Within the set of pending normal jobs, the scheduler will prefer jobs belonging to groups which have high "fairshare" priority.
+
 
+
Fairshare is based on a measure of recent (currently, past 2 months) resource usage. All user groups are ranked into 5 priority levels, with the heaviest users given lowest priority. You can examine your group's recent usage and priority here: [https://www.sharcnet.ca/my/profile/mysn Research Group's Usage and Priority]
+
 
+
For information on expected queue wait times, users can check the [https://www.sharcnet.ca/my/perf_cluster/cur_perf Recent Cluster Statistics] table in the web portal.
+
 
+
==== Some specific scheduling idiosyncrasies:====
+
One problem with cluster scheduling is that for a typical mix of job types (serial, threaded, various-sized MPI), the scheduler will rarely accumulate enough free CPUs at once to start any larger job. When an job completes, it frees N cpus. If there's an N-cpu job queued (and of appropriate priority), it'll be run. Frequently, jobs smaller than N will start instead. This may still give 100% utilization, but each of those jobs will complete, probably at different times, effectively fragmenting the N into several smaller sets. Only a period of idleness (lack of queued smaller jobs) will allow enough cpus to collect to let larger jobs run.
+
 
+
Narwhal uses a form of reservation scheduling to address this: for a fixed period of time, the scheduler will accumulate idle cpus in an attempt to run the currently highest-priority job. If it takes too long, other jobs will be started, and the accumulation will begin again. The accumulation period is chosen to optimize the chances of running jobs of a target size (around 32 cpus).
+
 
+
Requin is intended to enable "capability", or very large jobs. Rather than eliminating the ability to run more modest job sizes, Requin is configured with a weekly cycle: every Monday at noon, all previously running jobs will have finished and large queued jobs can start. One implication of this is that no job over 1 week can be run (and a 1-week job will only have one chance per week to start). Shorter jobs can be started at any time, but only a 1-day job can be started on Sunday, for instance.
+
 
+
Note that both Requin and Narwhal now enforce runtime limits - if the job is still running at the end of the stated limit, it will be terminated. (Before December 1 2008, only Narwhal would enforce runtime limits.)
+
 
+
Gaussian jobs on Bull are also scheduled somewhat differently: they are given a separate queue, which provides slightly higher priority to groups who have bought into the SHARCNET Gaussian license.
+
 
+
Finally, when running DDT or OPT (debugger and profiler), it's normal to use the test queue. If you need to run such jobs longer than 1 hour, and find the wait times too high when using the normal queues, let us know (open a ticket). It may be that we need to provide a special queue for these uses - possibly preemptive like the test queue.
+
 
+
 
+
== Programming and Debugging ==
+
 
+
=== What is MPI? ===
+
 
+
[http://www.mpi-forum.org/ MPI] stands for Message Passing Interface, a standard for writing portable parallel programs which is well-accepted in the scientific computing community. MPI is implemented as a library of subroutines which is layered on top of a network interface. The MPI standard has provided both C/C++ and Fortran interfaces so all of these languages can use MPI. There are several MPI implementations, including [http://www.open-mpi.org/ OpenMPI] and [http://www.mcs.anl.gov/research/projects/mpich2/ MPICH]. Specific high-performance interconnect vendors also provide their own libraries - usually a version of MPICH layered on an interconnect-specific hardware library. For SHARCNET Alpha clusters, the interconnect is Quadrics, which provides MPI and a low-level library called "elan". for Myrinet, the low-level library is MX or GM.
+
 
+
In addition to C/C++ and Fortran versions of MPI, there exist other language bindings as well. If you have any special needs, please contact us.
+
 
+
=== What is OpenMP? ===
+
 
+
[http://openmp.org/wp/ OpenMP] is a standard for programming shared memory systems using threads with compiler directives instrumented in the source code. It provides a higher-level approach to utilizing multiple processors within a single machine while keeping the structure of the source code as close to the conventional form as possible. OpenMP is much easier to use than the alternative (Pthreads) and thus is suitable for adding modest amounts of parallelism to pre-exiting code. Because OpenMP is a set of programs, your code can still be compiled by a serial compiler and should still behave the same.
+
 
+
OpenMP for C/C++ and Fortran are supported by many compilers, including the PathScale and PGI for Opterons, and the Intel compilers for IA32 and IA64 (such as SGI's Altix.).  OpenMP support has been provided in the GNU compiler suite since v4.2 (OpenMP 2.5), and starting with v4.4 supports the OpenMP 3.0 standard.
+
 
+
=== How do I run an OpenMP program with multiple threads? ===
+
 
+
An OpenMP program uses a single process with multiple threads rather than multiple processes. On SMP systems, threads will be scheduled on available processors, thus run concurrently. In order for each thread to run on one processor, one needs to request the same number of CPUs as the number of threads to use. This is done differently on different systems at SHARCNET where queueing systems are used. For instance, on Tru64 Alpha clusters, to run an OpenMP program foo that uses four threads with sqrun command, use the following
+
 
+
sqrun -q threaded -n 4 ./foo
+
 
+
The option <tt>-n 4</tt> specifies to reserve 4 CPUs per process. The same command applies to all systems which support sqrun (SQ).
+
 
+
=== What mathematics libraries are available? ===
+
 
+
Every system has the basic linear algebra libraries [http://www.netlib.org/blas/ BLAS] and [http://www.netlib.org/lapack/ LAPACK] installed. Normally, these interfaces are contained in vendor-tuned libraries.  On Intel-based (Xeon, Itanium2) clusters, users have the access to Intel math kernel library. On Opteron-based clusters, AMD's ACML library is available.
+
 
+
One may also find the GNU scientific library ([http://www.gnu.org/software/gsl/ GSL]) useful to some point for their particular needs. The GNU scientific library is an optional package, available on any machine.
+
 
+
For a detailed list of libraries on each clusters, please check the documentation on the corresponding SHARCNET satellite web sites
+
 
+
=== How do I use mathematics libraries such as BLAS and LAPACK routines? ===
+
 
+
First you need to know which subroutine you want to use. You need to check the references to find what routines meet your needs. Then place calls to those routines you want in your program and compile your program to use the particular libraries that have those routines. For instance, if you want compute the eigenvalues, and optionally the eigenvectors, of an ''N by N'' real non symmetric matrix in double precision, you find the LAPACK routine <tt>DGEEV</tt> will do that. All you need to do is to have a call to <tt>DGEEV</tt>, with required parameters as specified in the LAPACK document, and compile your program to link against the LAPACK library.
+
 
+
f77 -o myprog main.f sub1.f sub2.f ... sub13.f -llapack
+
 
+
The option <tt>-llapack</tt> tells the compiler to use library <tt>liblapack.a</tt>.
+
 
+
If the system you are using has vendor supplied libraries that have optimized LAPACK routines, such as Intel's math kernel library MKL (<tt>libmkl.a</tt>) or AMD's ACML library (<tt>libacml.a</tt>), then use those libraries with options <tt>-lmkl</tt> or <tt>-lacml</tt> instead, as they will give you better performance. The installation directories of those vendor libraries may vary from site to site. If such a library is not installed in the standard directory <tt>/lib</tt>, <tt>/usr/lib</tt> or <tt>/usr/local/lib</tt>, then chances are you would have to specify the lookup path for the compiler. For instance, on the Itanium2 cluster Spinner, the Intel version of LAPACK in the math kernel library mkl is located in <tt>/opt/intel/mkl/lib/64</tt>, in the above example, one will use command
+
 
+
ifort -o myprog main.f sub1.f sub2.f ... sub13.f -L/opt/intel/mkl/lib/64 -lmkl_lapack
+
 
+
where <tt>ifort</tt> is the Intel Fortran compiler, the option <tt>-L/opt/intel/mkl/lib/64 -lmkl_lapack</tt> specifies the library path. Please check the local documentation at each site for details.
+
 
+
You should never need to copy or use the individual source code of those library routines and compile them together with your program.
+
 
+
=== My code is written in C/C++, can I still use those libraries? ===
+
 
+
Yes. Most of the libraries have C interfaces. If you are not sure about the C interface or you need assistance in using those libraries written in Fortran, we can help you out on a case to case basis.
+
 
+
=== What packages are available? ===
+
 
+
Various packages have been installed on SHARCNET clusters at users' requests. Custom installed packages include, for example, [http://www.gaussian.com/ Gaussian], [http://www.mcs.anl.gov/petsc/petsc-2/ PETSc], [http://www.r-project.org/ R], [http://www.featflow.de/ Featflow], [http://www.msg.chem.iastate.edu/gamess/ Gamess], [http://dasher.wustl.edu/tinker/ Tinker], [http://www.umass.edu/microbio/rasmol/ Rasmol], and [http://www.maplesoft.com/ Maple]. Please check the SHARCNET [http://www.sharcnet.ca web portal] for the [https://www.sharcnet.ca/my/software software packages] installed and related usage information.
+
 
+
=== What interconnects are used on SHARCNET clusters? ===
+
 
+
Currently, several different interconnects are being used on SHARCNET clusters: [http://doc.quadrics.com/Quadrics/QuadricsHome.nsf/DisplayPages/Homepage Quadrics], [http://www.myri.com/ Myrinet], [http://www.infinibandta.org/ InfiniBand] and standard IP-based ethernet.
+
 
+
=== I would like to do some grid computing, how should I proceed? ===
+
 
+
Depends on what you mean by "grid computing". If you simply mean you want to queue up a bunch of jobs (MPI, threaded or serial) and have them run without further attention, then great! SHARCNET's model is exactly that kind of grid. However, we do not attempt to hide differences between clusters, such as file systems that are remote, different types of CPUs or interconnect. We do not currently attempt to provide a single queue which feeds jobs to all of the clusters. Such a unified grid would require you to ensure that your program was compiled and configured to run under Alpha Linux, Alpha Tru64, IA32 Linux, IA64 Linux, AMD64 Linux. It would also have to assume nothing about shared file systems, and it would have to be aware of the 5000x difference in latency when sending messages within a cluster versus between clusters, as well as either rely on least-common-denominator networking (ethernet) or else explicitly manage the differences between Quadrics, Myrinet, Infiniband and ethernet.
+
 
+
If, however, you would like to try something "unusual" that requires much more freedom than the current resource management system can handle, then, you would need to discuss the details of your plan with us for special arrangement.
+
 
+
=== Debugging serial and parallel programs ===
+
 
+
''Debugger'' is a program which helps to identify mistakes ("bugs") in programs - either run-time, or "post-mortem" (by analyzing the core file produced by a crashed program). Debuggers can be either command-line, or GUI (graphical user interface) based. Before a program can be debugged, it needs to be (re-)compiled with a switch, <tt>-g</tt>, which tells the compiler to include symbolic information into the executable.
+
 
+
SHARCNET provides a few debuggers. <tt>gdb</tt> (installed on all clusters, type "<tt>man gdb</tt>" to get a list of options), <tt>pathdb</tt> (Optron clusters), and <tt>idb</tt> (Silky) are the command-line-based ones. The idb debugger also has a GUI (run "<tt>idb -gui</tt>"). For a brief introduction on common programming bugs and how to use gdb at SHARCNET please see the [[Common Bugs and Debugging with gdb]] online tutorial.
+
 
+
The above debuggers can only be used for serial programs. For parallel codes (either MPI or threads based), one can use the commercial debugger DDT which is installed on most of our Optron clusters (requin, narwhal, bull, and the PoP clusters). DDT has an advanced GUI, and can also be used for debugging serial programs. The short description of DDT can be found on our [http://www.sharcnet.ca/my/software Software] page. You can also refer to the detailed [[Parallel Debugging with DDT]] online tutorial.
+
 
+
===  What does my error code mean ? ===
+
 
+
When your program returns an error code it means something went wrong. You can findout the meaning
+
of the error code by typing following command:
+
 
+
        man 7 signal
+
 
+
However, if you want to fix the problem then you should use a debugger (e.g. gdb or DDT) to locate
+
the instructions that are causing the problem. For a brief introduction on how to use gdb see the Common Bugs and Debugging with gdb online tutorial.
+
 
+
== Getting Help ==
+
 
+
=== I have encountered a problem while using a SHARCNET system and need help, who should I talk to? ===
+
If you have access to the Internet, we encourage you to use the [https://www.sharcnet.ca/my/problems/submit problem ticketing system] through the [http://www.sharcnet.ca web portal].  As this is the most efficient way of reporting a problem and receiving quick responses. You are also welcome to contact system administrators and/or high performance technical computing consultants at any time. You may find the contact information on the [https://www.sharcnet.ca/my/contact/directory directory] page.
+
 
+
=== How do I use the problem ticket system? ===
+
 
+
We have a simple problem ticket system that you can find by logging into our website and clicking on "Help" then "Problems". It allows you to submit and track your problem tickets. If you are searching for a solution, one thing you can do is type keywords into the "Search" field in the upper right hand corner of the SHARCNET web page, and this will perform a search of current and past problem tickets. Someone else may have already submitted a problem ticket on the same issue. Once you find a ticket that concerns you, you can add a comment or click "watch" to get email updates.
+
 
+
 
+
=== How do I give other users access to my files ? ===
+
 
+
To give other users access to your files follow
+
following procedure:
+
 
+
In the instructions below instead of <your-subdirectory> type the
+
name of the subdirectory where the files you want to give access
+
to are located:
+
 
+
(1) Go to your home directory by typing:
+
 
+
        cd
+
 
+
(2) Authorize access to the home directory by typing command:
+
 
+
        chmod go+rX  .
+
 
+
 
+
(3) Authorize access to  subdirectory <your-subdirectory> by typing command:
+
 
+
        chmod -R go+rX <your-subdirectory>
+
 
+
=== What do I need to specify in a ticket ? ===
+
 
+
If you do not find any tickets that deal with you current problem, then you should
+
logging into our website and click on "Help" then "Problems" then "Submit and
+
supply detailed information on your problem. In the Comments box enter information
+
on how you execute the program (i.e. specify the commands you use to run the job).
+
 
+
If this is a program that uses source files to compile your program then indicate
+
where the source files are located and give read access to your files and directories
+
(starting from the home directory up to the subdirecory where the source files are).
+
See previous FAQ: How do I give other users access to my files
+
 
+
 
+
=== How do I checkpoint/restart my program on a  SHARCNET cluster ?  ===
+
 
+
Assuming it is a serial or multi-threaded program (i.e. *not* MPI), you can make use
+
of Berkeley Labs Checkpoint Restart software that is provided on the clusters.
+
Documentation and usage instructions can be found on SHARCNET's [http://www.sharcnet.ca/my/software/show/74 BLCR software] page.
+
 
+
=== I am new to parallel programming, where can I find quick references at SHARCNET? ===
+
SHARCNET has a number of training modules on parallel programming using MPI, OpenMP, pthreads and other frameworks.  Each of these modules has working examples that are designed to be easy to understand while illustrating basic concepts.  You may find these along with copies of slides from related presentations and links to external resources on the [[Main Page]] of this training/help site.
+
 
+
=== I am new to parallel programming, can you help me get started with my project? ===
+
Absolutely. We will be glad to help you from planning the project, architecting your application programs with appropriate algorithms and choosing efficient tools to solve associated numerical problems to debugging and analyzing your code. We will do our best to help you speed up research.
+
 
+
=== Can you install a package on a cluster for me? ===
+
Certainly.  We suggest you make the request by sending e-mail to [mailto:help@sharcnet.ca help@sharcnet.ca], or [https://www.sharcnet.ca/my/problems/submit opening a problem ticket] with the specific request.
+
 
+
=== I am in a process of purchasing computer equipment for my research, would you be able to provide technical advice on that? ===
+
If you tell us what you want, we may be able to help you out.
+
 
+
=== Does SHARCNET have a mailing list or user group? ===
+
Yes. You may subscribe to one or more mailing lists on the email list page available once you log into the web portal.
+
 
+
=== Does SHARCNET provide any training on programming and using the systems? ===
+
 
+
Yes. SHARCNET provides workshops on specific topics from time to time and offers courses at some sites. Every June, SHARCNET holds an annual ''summer school'' with a variety of in-depth, hands-on workshops.  All materials from past workshops/presentations can be found on the SHARCNET web portal.
+
 
+
 
+
== Research at SHARCNET ==
+
 
+
=== Where can I find what other people do at SHARCNET? ===
+
 
+
You may find some of the research activities at SHARCNET by visiting our [https://www.sharcnet.ca/my/research/initiatives research initiatives] and [https://www.sharcnet.ca/my/research/profiles researcher profile] pages.
+
 
+
=== I have a research project I would like to collaborate on with SHARCNET, who should I talk to? ===
+
 
+
You may contact [mailto:barb@sharcnet.ca SHARCNET head office] or contact members of the SHARCNET technical staff.
+
 
+
=== How can I contribute compute resources to SHARCNET so that other researchers can share it? ===
+
 
+
Most people's research is "bursty" - there are usually sparse periods of time when some computation is urgently needed, and other periods when there is less demand. One problem with this is that if you purchase the equipment you need to meet your "burst" needs, it'll probably sit, underutilized, during other times.
+
 
+
An alternative is to donate control of this equipment to SHARCNET, and let us arrange for other users to use it when you are not. We prefer to be involved in the selection and configuration of such equipment. Some of SHARCNET's most useful clusters were created this way — Goblin and Wobbie were purchased with user contributions. Our promise to contributors is that as much as possible, they should obtain as much benefit from the cluster as if it were not shared. Owners get preferential access. Naturally, owners are also able to burst to higher peak usage, since their equipment has been pooled with other contributions. (Technically, SHARCNET cannot itself own such equipment — it remains owned by the institution in question, and will be returned to the contributor upon request.) If you think this model will also work for you and you would like to contribute your computational resource to help the research community at SHARCNET, you can contact us for such arrangement.
+
 
+
=== I do not know much about computation, nor is it my research interest. But I am interested in getting my research done faster with the help of the high performance computing technology. In other words, I do not care about the process and mechanism, but only the final results. Can SHARCNET provide this type of help? ===
+
 
+
We will be happy to bring the technology of high performance computing to you to accelerate your research, if at all possible. If you would like to discuss your plan with us, please feel free to contact our high performance computing specialists. They will be happy to listen to your needs and are ready to provide appropriate suggestions and assistance.
+
 
+
 
+
== Fellowships at SHARCNET ==
+
 
+
=== I heard SHARCNET offers fellowships, where can I get more information? ===
+
 
+
You may find additional information regarding fellowships and other dedicated resource opportunities on the [https://www.sharcnet.ca/my/research/fellowships Research Fellowships page] of the web portal.  A dedicated online [http://www.sharcnet.ca/Documents/faqs.pdf FAQ] is also available.
+
 
+
=== I would like to do some research at SHARCNET as a visiting scholar, how should I apply? ===
+
 
+
In general, you will need to find a hosting department or a person affiliated with one of the SHARCNET institutions. You may also [http://www.sharcnet.ca/my/contact contact us] directly for more specific information.
+
 
+
=== I would like to send my students to SHARCNET to do some work for me. How should I proceed? ===
+
 
+
See above.
+
 
+
 
+
== Contacting SHARCNET ==
+
 
+
=== How do I contact SHARCNET for research, academic exchanges, and technical issues? ===
+
 
+
Please contact our Scientific Director or check for your specific issue in this FAQ.
+
 
+
=== How do I contact SHARCNET for business development, education and other issues? ===
+
 
+
Please contact [mailto:barb@sharcnet.ca SHARCNET head office].
+
 
+
 
+
== How to Acknowledge SHARCNET in Publications ==
+
 
+
=== How do I acknowledge SHARCNET in my publications? ===
+
 
+
We recommend the following:
+
 
+
This work was made possible by the facilities of the Shared Hierarchical
+
Academic Research Computing Network (SHARCNET:www.sharcnet.ca).
+
 
+
=== I've seen different spellings of the name, what is the standard spelling of SHARCNET? ===
+
 
+
We suggest the spelling SHARCNET, all in upper case.
+
 
+
 
+
== What types of research programs / support are provided to the research community? ==
+
 
+
Our overall intent is to provide support that can both respond to the range of needs that the user community presents and help to increase the sophistication of the community and enable new and larger-in-scope applications making use of SHARCNET's HPC facilities. The range of support can perhaps best be understood in terms of a pyramid:
+
 
+
=== Level 1 ===
+
 
+
At the apex of the pyramid, SHARCNET supports a small number of projects with dedicated programmer support. The intent is to enable projects that will have a lasting impact and may lead to a "step change" in the way research is done at SHARCNET. Inter-disciplinary and inter-institutional projects are particularly welcomed. Projects can expect to receive support at the level of 2 to 6 months direct support per year for one to two years. Programming time is allocated through a competitive process. See the [https://www.sharcnet.ca/Documents/SN_prog_application_guidelines.pdf guidelines].
+
 
+
=== Level 2 ===
+
 
+
The middle layers of support are provided through a number of initiatives.
+
 
+
These include:
+
* Programming support of more modest duration (several days to one month engagement, usually part time)
+
* Training on a variety of topics through [[Main Page|workshops, seminars and online training materials]]
+
* Consultation. This may include user-initiated interactions on particular programs, algorithms, techniques, debugging, optimization etc., as well as unsolicited help to ensure effective use of SHARCNET systems
+
* Site Leaders play an important role in working with the community to help researchers connect with SHARCNET staff and to obtain appropriate help and support.
+
 
+
=== Level 3 ===
+
 
+
The base level of the pyramid handles the very large number of small requests that are essential to keeping the user community working effectively with the infrastructure on a day-to-day basis. Several of these can be answered by this FAQ; many of the issues are presented through the [https://www.sharcnet.ca/my/problems/submit ticketing system]. The support is largely problem oriented with each problem being time limited.
+

Latest revision as of 12:36, 10 May 2019

Sharcnet logo.jpg
Knowledge Base / Expanded FAQ

Note: Most of the old FAQ entries have been removed as they are now covered at the Compute Canada level: https://docs.computecanada.ca/wiki/Frequently_Asked_Questions Some of the information on this page was moved to the Legacy Systems page.


Contents


About SHARCNET

What is SHARCNET?

SHARCNET stands for Shared Hierarchical Academic Research Computing Network. Established in 2000, SHARCNET is the largest high performance computing consortium in Canada, involving 18 universities and colleges across southern, central and northern Ontario.

SHARCNET is a member consortium in the Compute/Calcul Canada national HPC platform.

Where is SHARCNET?

The main office of SHARCNET is located in the Western Science Centre at The University of Western Ontario. The SHARCNET high performance clusters are installed at a number of the member institutions in the consortium and operated by SHARCNET staff across different sites.

What does SHARCNET have?

The primary SHARCNET compute system is the Graham heterogeneous cluster located at the University of Waterloo. It is named after Wes Graham, the first director of the Computing Centre at Waterloo. It consists of 36,160 cores and 320 GPU devices, spread across 1,127 nodes of different configurations.

What can I do with SHARCNET?

If you have a program that takes months to run on your PC, you could probably run it within a few hours using hundreds of processors on the SHARCNET clusters, provided your program is inherently parallelisable. If you have hundreds or thousands of test cases to run through on your PC or computers in your lab, then with hundreds of processors running those cases independently will significantly reduce your test cycles .

If you have used beowulf clusters made of commodity PCs, you may notice a performance improvement on SHARCNET clusters which have high-speed Infiniband interconnects, as well as SHARCNET machines which have large amounts of memory. Also, SHARCNET clusters themselves are connected through a dedicated, private connection over the Ontario Research Innovation Optical Network (ORION).

If you have access to other super computing facilities at other places and you wish to share your ideas with us and SHARCNET users, please contact us. Together we can make SHARCNET better.

Who is running SHARCNET?

The daily operation and development of SHARCNET computational facilities is managed by a group of highly qualified system administrators. In addition, we have a team of high performance technical computing consultants, who are responsible for technical support on libraries, programming and application analysis.

How do I contact SHARCNET?

For technical inquiries, you may send E-mail to help@sharcnet.ca, or contact your local system administrator or HPC specialist. For general inquiries, you may contact the SHARCNET main office.

Getting an Account with SHARCNET and Related Issues

To use SHARCNET (and also Compute Canada) facilities one has to apply for a Compute Canada account.


Getting Help

I have encountered a problem while using a Compute Canada/SHARCNET system and need help, who should I talk to?

If you have access to the Internet, we encourage you to use the problem ticketing system (described in detail below) . This is the most efficient way of reporting a problem as it minimizes email traffic and will likely result in you receiving a faster response than through other channels.

You are also welcome to contact system administrators and/or high performance technical computing consultants at any time. You may find their contact information on the directory page.

How long should I expect to wait for support?

Unfortunately Compute Canada/SHARCNET does not have adequate funding to provide support 24 hours a day, 7 days a week. User support and system monitoring is limited to regular business hours: there is no official support on weekends or holidays, or outside 9:00 - 17:00 EST .

Please note that this includes monitoring of our systems and operations, so typically when there are problems overnight or on weekends/holidays system notices will not be posted until the next business day.

Compute Canada Problem Ticket System

What is a "problem ticket system"?

This is a system that allows anyone with a Compute Canada account to start a persistent email thread that is referred to as a "problem ticket". When a user submits a new ticket it will be brought to the attention of an appropriate and available Compute Canada/SHARCNET staff member for resolution.

You can interact with the ticket system entirely via email. There is also a web interface to see tickets you have submitted in the past.

What do I need to specify in a ticket ?

To help us address your question faster, please try to do the following when submitting a ticket:

  1. specify which of our systems is involved
  2. if the problem pertains to a job, then report the jobid associated with the job; this is an integer that is returned by the scheduler when you submit the job
  3. report the exact commands necessary to duplicate the problem, as well as any error output that helps identify the problem; if relevant, this should include how the code is compiled, how the job is submitted, and/or anything else you are doing from the command line relating to the problem
  4. if you'd like for a particular staff member to be aware of the ticket, mention them

How do I submit a ticket?

In general, you can submit a new ticket by emailing support@computecanada.ca with the email address associated with your Compute Canada account. If you are using another email address, please provide your full name, your Compute Canada default username (if available) and your university or institution.

If you like, you can also target your inquiry more specifically, by using the following addresses to submit your ticket:

I am new to parallel programming, where can I find quick references at SHARCNET?

SHARCNET has a number of training modules on parallel programming using MPI, OpenMP, pthreads and other frameworks. Each of these modules has working examples that are designed to be easy to understand while illustrating basic concepts. You may find these along with copies of slides from related presentations and links to external resources on the Main Page of this training/help site.

I am new to parallel programming, can you help me get started with my project?

Absolutely. We will be glad to help you from planning the project, architecting your application programs with appropriate algorithms and choosing efficient tools to solve associated numerical problems to debugging and analyzing your code. We will do our best to help you speed up research. If your programming project would involve a significant staff time, you should consider applying for Dedicated Programming support. (We run the competition annually; see https://www.sharcnet.ca/my/research/programming).

Can you install a package on a cluster for me?

Certainly. We suggest you make the request by sending e-mail to help@sharcnet.ca with the specific request.

I am in a process of purchasing computer equipment for my research, would you be able to provide technical advice on that?

If you tell us what you want, we may be able to help you out.

Does SHARCNET provide any training on programming and using the systems?

Yes. SHARCNET provides workshops on specific topics from time to time and offers courses at some sites. Every summer (usually late May to early June), SHARCNET holds an annual HPC Summer School with a variety of in-depth, hands-on workshops. Many materials from past workshops/presentations can be found on the SHARCNET's web portal.

SHARCNET also offers a series of online seminars (so-called "General interest webinars"), typically delivered every second Wednesday at lunch time. These are announced via the SHARCNET events mailing list and one can see the schedule at the SHARCNET event calendar. Past seminars are recorded and posted on our youtube channel. A full listing of the past webinars is available on the Online Seminars page.

Attending SHARCNET Webinars

SHARCNET makes a number of seminar events available online (New User Seminar, general interest talks, etc.) using software/services from Vidyo. Vidyo allows both the presenter and the attendees to offer or participate in online seminars by using their web browser or installing a small application. If this is your first Vidyo seminar please join the seminar ahead of the official start, to sort out any technical issues. Vidyo is supported on most platforms, both "stationary" (Windows, MacOS, Linux) and mobile (iOS, Android).

Please note that if your device has a microphone (highly recommended) and/or webcam, they will be used by Vidyo to transmit your audio and video to all seminar participants. They will be on by default, but you can always disable them by clicking on a corresponding button at the bottom of your Vidyo window. We ask that all attendees keep their microphones muted, unless you want to ask something.

We normally record our seminars, and make them available to all SHARCNET users. All recent and new webinars are posted on our youtube channel, http://youtube.sharcnet.ca . The links to the video recordings, slides and abstracts can be found on our online seminars page.

If you do not have headphones and or microphone, we provide a toll free number call-in option: 1-855-728-4677, ext 5542.

To receive email notifications about upcoming General Interest seminars, Summer Schools, and other training events, add your email to our Events mailing list.

Please note that times for our webinars are for the Eastern Time (EST/EDT) zone.

Unsubscribing from our mailing lists

To remove yourself from our Events mailing list, please follow these instructions:

the bottom of our Events emails);

  • At the bottom of that page, enter your email into empty field, and press the button

"Unsubscribe or edit options";

  • On the next page, click on the "Unsubscribe" button;
  • You will receive a confirmation email; click on the link there to finalize the removal

process.

Research at SHARCNET

I have a research project I would like to collaborate on with SHARCNET, who should I talk to?

You may contact SHARCNET head office or contact members of the SHARCNET technical staff.

How can I contribute compute resources to SHARCNET so that other researchers can share it?

Most people's research is "bursty" - there are usually sparse periods of time when some computation is urgently needed, and other periods when there is less demand. One problem with this is that if you purchase the equipment you need to meet your "burst" needs, it'll probably sit, underutilized, during other times.

An alternative is to donate control of this equipment to SHARCNET, and let us arrange for other users to use it when you are not. We prefer to be involved in the selection and configuration of such equipment. Our promise to contributors is that as much as possible, they should obtain as much benefit from the cluster as if it were not shared. Owners get preferential access. Naturally, owners are also able to burst to higher peak usage, since their equipment has been pooled with other contributions. (Technically, SHARCNET cannot itself own such equipment — it remains owned by the institution in question, and will be returned to the contributor upon request.) If you think this model will also work for you and you would like to contribute your computational resource to help the research community at SHARCNET, you can contact us for such arrangement.

I do not know much about computation, nor is it my research interest. But I am interested in getting my research done faster with the help of the high performance computing technology. In other words, I do not care about the process and mechanism, but only the final results. Can SHARCNET provide this type of help?

We will be happy to bring the technology of high performance computing to you to accelerate your research, if at all possible. If you would like to discuss your plan with us, please feel free to contact our high performance computing specialists. They will be happy to listen to your needs and are ready to provide appropriate suggestions and assistance.

I need access to more CPU cores or storage than are available by default, what programs exist to support demanding computation?

SHARCNET participates in the Compute Canada NRAC (National Resource Allocation Competition) and provides a continual competition for groups that require more than the default level of access to our resources. Please see Dedicated Resources for further information.

I heard SHARCNET offers fellowships, where can I get more information?

SHARCNET no longer actively runs a fellowship program. You may find information regarding past fellowships and other dedicated resource opportunities on the Research Fellowships page of the web portal.

I would like to do some research at SHARCNET as a visiting scholar, how should I apply?

In general, you will need to find a hosting department or a person affiliated with one of the SHARCNET institutions. You may also contact us directly for more specific information.

I would like to send my students to SHARCNET to do some work for me. How should I proceed?

See above.



Contacting SHARCNET

How do I contact SHARCNET for research, academic exchanges, and technical issues?

Please contact SHARCNET head office.

How do I contact SHARCNET for business development, education and other issues?

Please contact SHARCNET head office.

How do I contact a specific staff member at SHARCNET?

See staff directory for contact information.

How to Acknowledge SHARCNET in Publications

How do I acknowledge SHARCNET in my publications?

We recommend one cite the following:

This work was made possible by the facilities of the Shared Hierarchical 
Academic Research Computing Network (SHARCNET:www.sharcnet.ca) and Compute/Calcul Canada.

I've seen different spellings of the name, what is the standard spelling of SHARCNET?

We suggest the spelling SHARCNET, all in upper case.


What types of research programs / support are provided to the research community?

Our overall intent is to provide support that can both respond to the range of needs that the user community presents and help to increase the sophistication of the community and enable new and larger-in-scope applications making use of SHARCNET's HPC facilities. The range of support can perhaps best be understood in terms of a pyramid:

Level 1

At the apex of the pyramid, SHARCNET supports a small number of projects with dedicated programmer support. The intent is to enable projects that will have a lasting impact and may lead to a "step change" in the way research is done at SHARCNET. Inter-disciplinary and inter-institutional projects are particularly welcomed. For the latest information about the program, including application guidelines, please see the Programming Competition page in our web portal.

Level 2

The middle layers of support are provided through a number of initiatives.

These include:

  • Programming support of more modest duration (several days to one month engagement, usually part time)
  • Training on a variety of topics through workshops, seminars and online training materials
  • Consultation. This may include user-initiated interactions on particular programs, algorithms, techniques, debugging, optimization etc., as well as unsolicited help to ensure effective use of SHARCNET systems
  • Site Leaders play an important role in working with the community to help researchers connect with SHARCNET staff and to obtain appropriate help and support.

Level 3

The base level of the pyramid handles the very large number of small requests that are essential to keeping the user community working effectively with the infrastructure on a day-to-day basis. Several of these can be answered by this FAQ; many of the issues are presented through the ticketing system. The support is largely problem oriented with each problem being time limited.