From Documentation
Jump to: navigation, search
(How are file permissions handled at SHARCNET?)
 
(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
<!--This page is transcluded to the main FAQ page.  If you make changes here, make sure the changes show up on the main FAQ page.  You may have to make an edit to the main FAQ page to force a refresh. -->
 
<!--This page is transcluded to the main FAQ page.  If you make changes here, make sure the changes show up on the main FAQ page.  You may have to make an edit to the main FAQ page to force a refresh. -->
 
{{Template:GrahamUpdate}}
 
  
 
== Logging in to Systems, Transferring and Editing Files ==
 
== Logging in to Systems, Transferring and Editing Files ==
  
 
=== How do I login to SHARCNET? ===
 
=== How do I login to SHARCNET? ===
There is no single point of entry at present. "Logging in to SHARCNET" means you login to one of the SHARCNET systems. A complete list of SHARCNET systems can be found on [https://www.sharcnet.ca/my/systems our facilities page].
+
You access the SHARCNET clusters using ssh.  For Graham and other national systems Compute Canada credentials are required. For the remaining systems, [https://www.sharcnet.ca/help/index.php?title=Legacy_Systems#Systems listed here], you will require SHARCNET credentials.
 
+
Please note that '''graham''' system requires Compute Canada credentials to login.
+
  
 
====Unix/Linux/OS X====
 
====Unix/Linux/OS X====
To login to a system, you need to use an [http://www.openssh.com/ Secure Shell (SSH)] connection. If you are logging in from a UNIX-based machine, make sure it has an SSH client (ssh) installed (this is almost always the case on UNIX/Linux/OS X). If you have the same login name on both your local system and SHARCNET, and you want to login to, say, <tt>saw</tt>, you may use the command:
+
To login to a system, you need to use an [http://www.openssh.com/ Secure Shell (SSH)] connection. If you are logging in from a UNIX-based machine, make sure it has an SSH client (ssh) installed (this is almost always the case on UNIX/Linux/OS X). If you have the same login name on both your local system and SHARCNET, and you want to login to, say, <tt>graham</tt>, you may use the command:
  
  ssh saw.sharcnet.ca
+
  ssh graham.computecanada.ca
  
If your SHARCNET username is different from the username on your local systems, then you may use either of the following forms:
+
If your Compute Canada username is different from the username on your local systems, then you may use either of the following forms:
  
  ssh saw.sharcnet.ca -l username
+
  ssh graham.computecanada.ca -l username
  ssh username@saw.sharcnet.ca
+
  ssh username@graham.computecanada.ca
  
If you want to eastablish an X window connection so that you can use graphics applications such as <tt>gvim</tt> and <tt>xemacs</tt>, you can add a <tt>-Y</tt> to the command:
+
If you want to establish an X window connection so that you can use graphics applications such as <tt>gvim</tt> and <tt>xemacs</tt>, you can add a <tt>-Y</tt> to the command:
  
  ssh -Y username@saw.sharcnet.ca
+
  ssh -Y username@graham.computecanada.ca
  
 
This will automatically set the X DISPLAY variable when you login.
 
This will automatically set the X DISPLAY variable when you login.
 
'''IMPORTANT''': to login to Graham, you have to use your Compute Canada credentials (login name and password), not your SHARCNET credentials!
 
  
 
====Windows====
 
====Windows====
 
If you are logging from a computer running Windows and need some pointers we recommend consulting our [[SSH]] tutorial.
 
If you are logging from a computer running Windows and need some pointers we recommend consulting our [[SSH]] tutorial.
  
=== What is the difference between Login Nodes and Development Nodes? ===
+
=== What is the difference between Login Nodes and Compute Nodes? ===
 
====Login Nodes====
 
====Login Nodes====
  
Line 37: Line 31:
  
 
You can also use them for other quick tasks, like simple post-processing, but any significant work should be submitted as a job to the compute nodes.  On most login nodes, each process is limited to 1 cpu-hour; this will be noticable if you perform anything compute-intensive, and can affect IO-oriented activity as well (such as very large ''scp'' or ''rsync'' operations.)
 
You can also use them for other quick tasks, like simple post-processing, but any significant work should be submitted as a job to the compute nodes.  On most login nodes, each process is limited to 1 cpu-hour; this will be noticable if you perform anything compute-intensive, and can affect IO-oriented activity as well (such as very large ''scp'' or ''rsync'' operations.)
 
Here is an example of logging in and being redirected to a saw login node, in this case ''saw-login1'':
 
 
localhost:~ sn_user$ ssh saw.sharcnet.ca
 
Last login: Fri Oct 14 22:38:40 2011 from localhost.your_institution.ca
 
 
Welcome to the SHARCNET cluster Saw.
 
Please see the following URL for status of this and other clusters:
 
https://www.sharcnet.ca/my/systems
 
 
 
[sn_user@saw-login1 ~]$ hostname
 
saw-login1
 
 
====Development Nodes====
 
 
On some systems there are also ''development nodes'' which can be used to do slightly more resource intensive, interactive work. For the most part these are identical to cluster login nodes, however they are not visible outside of their respective cluster (one can only reach them after logging into a login node) and they have more modest resource limits in place, allowing for the ability to do quick interactive testing outside of the job queuing system.  Please see the help wiki pages for the respective clusters, [[Orca]], [[Saw]] and [[Kraken]], for further details on how one can use these nodes.
 
  
 
=== How can I suspend and resume my session? ===
 
=== How can I suspend and resume my session? ===
Line 80: Line 57:
  
 
If you are familiar with UNIX, then using a cluster is not much different from using a workstation. When you login to a cluster, you in fact only log in to one of the cluster nodes. In most cases, each cluster node is a physical machine, usually a server class machine, with one or several CPUs, that is more or less the same as a workstation you are familiar with. The difference is that these nodes are interconnected with special interconnect devices and the way you run your program is slightly different. Across SHARCNET clusters, you are not expected to run your program interactively. You will have to run your program through a queueing system. That also means where and when your program gets to run is not decided by you, but by the queueing system.
 
If you are familiar with UNIX, then using a cluster is not much different from using a workstation. When you login to a cluster, you in fact only log in to one of the cluster nodes. In most cases, each cluster node is a physical machine, usually a server class machine, with one or several CPUs, that is more or less the same as a workstation you are familiar with. The difference is that these nodes are interconnected with special interconnect devices and the way you run your program is slightly different. Across SHARCNET clusters, you are not expected to run your program interactively. You will have to run your program through a queueing system. That also means where and when your program gets to run is not decided by you, but by the queueing system.
 
=== Which cluster should I use? ===
 
Each of our clusters is designed for a particular type of job. Our [https://www.sharcnet.ca/my/systems/clustermap cluster map] shows which systems are suitable for various job types.
 
  
 
=== What programming languages are supported? ===
 
=== What programming languages are supported? ===
Line 88: Line 62:
  
 
=== How do I organize my files? ===
 
=== How do I organize my files? ===
 
+
Main file systems on our National systems:
To best meet a range of storage needs, SHARCNET provides a number of distinct storage pools that are implemented using a variety of file systems, servers, RAID levels and backup policies.  These different storage locations are summarized as follows:
+
 
+
==== Legacy system ====
+
{| class="wikitable" style="text-align:left" border="1"
+
! place        !! quota**    !! expiry  !! access          !! purpose !! backed-up?
+
|-
+
| /home    ||  10 GB  ||  none    ||  unified          ||  sources, small config files || Yes
+
|-
+
| /work      ||  1 TB  ||  none    || unified*    ||  active data files || No
+
|-
+
| /scratch  ||  none      || 2 months ||  per-cluster  ||  temporary files, checkpoints || No
+
|-
+
| /tmp      ||  none    ||  2 days  ||  per-node    ||  node-local scratch || No
+
|-
+
| /freezer || 2 TB || 2 years || unified (login nodes only) || long term data archive || No
+
|}
+
 
+
*The quota column indicates if the file system has a per-user limit to the amount of data they can store. 
+
*The expiry column indicates if the file system automatically deletes old files and the timescale for deletion.
+
*The access column indicates the scope, or availability of the file system.  "unified" means that when you login, regardless of cluster, you will always see the same directory.
+
 
+
'''*''' May be less and not unified on some of our clusters (eg. requin and some of the specialty systems), type "quota" when you log into a cluster for up to date information.<br>
+
'''**''' There is also a quota on the maximum number of files a user can have on any file system. <font color="red">Currently the limit is '''1,000,000'''.</font>
+
 
+
''For more detailed information please go to the [[Using Storage]] article.''
+
 
+
====Where is my /work folder?====
+
 
+
/work is an ''automounted'' filesystem.  When you first login to a system your directory may not appear in the /work directory.  As soon as you access it (''cd'' to it, or ''ls'' it or it's contents), the system will make your directory visible and it will appear in the /work directory.  If you are connecting with a gui client you need to go to the full path of your work directory /work/YOUR_USER_NAME .
+
 
+
====Best storage to use for jobs====
+
 
+
Since /home is remote on most clusters and is used frequently by all users, it's important that it not be used significantly for jobs (eg. reading in a small configuration file from /home is ok - writing repeatedly to many different files in /home during the course of your jobs is not). 
+
 
+
One can do significant I/O to /work from jobs, but it is also remote to most clusters.  For this reason, to obtain the best file system throughput you should use the /scratch file system.  In some cases jobs may be able to make use of /tmp for local caching, but it is not recommended as a storage target for regular output.
+
 
+
For users who want to learn more about optimizing I/O at SHARCNET please read [[Analyzing I/O Performance]].
+
 
+
====Cluster-local Scratch storage====
+
 
+
/scratch has no quota limit - so you can put as much data in /scratch/<userid> as you want, until there is no more space. The important thing to note though, is that all files on /scratch that are over 62 days old will be automatically deleted (please see [[Knowledge_Base#How_are_files_deleted_from_the_/scratch_filesystems?|this knowledge base entry]] for details on how /scratch is purged of old files).
+
 
+
====Backups====
+
 
+
Backups are in place for your home directory <b>ONLY</b>. Scratch and global work are ''not'' backed up. In general we store one version of each file for the previous 5 working days, one for each of the 4 previous weeks, and one version per month before that. Backups began in September 2006.
+
 
+
====Node-Local Storage====
+
 
+
/tmp may be unavailable for use on clusters where there are no local disks on the compute nodes.  Users should try to use /scratch instead, or [mailto:help@sharcnet.ca email help@sharcnet.ca] to discuss using node-local storage.
+
 
+
====Archival Storage====
+
 
+
To backup large volumes of data that don't need to stay available on global work or local scratch use the /freezer filesystem.
+
 
+
'''Please note:''' unlike our old /archive file system, the new /freezer file system has both a size quota (2TB; going over the quota results in your submitted jobs not running - same as with /work), and an expiry: after 2 years your files will be deleted. See our [https://www.sharcnet.ca/my/systems/storage storage policies page] for details.
+
 
+
=== How do I organize my files? ['''Graham'''] ===
+
  
 
[[File:filesystemgraham.png|500px]]
 
[[File:filesystemgraham.png|500px]]
Line 173: Line 90:
 
  man chmod
 
  man chmod
  
==== What about really large files or if I get the error 'No space left on device' in /gwork or /scratch? ====
+
==== What about really large files or if I get the error 'No space left on device' in ~/project or ~/scratch? ====
  
 
If you need to work with really large files we have tips on optimizing performance with our parallel filesystems [[Analyzing_I/O_Performance#How_to_manage_really_large_files|here]].
 
If you need to work with really large files we have tips on optimizing performance with our parallel filesystems [[Analyzing_I/O_Performance#How_to_manage_really_large_files|here]].
Line 180: Line 97:
  
 
=====Unix/Linux=====
 
=====Unix/Linux=====
To transfer files to and from a cluster on a UNIX machine, you may use <tt>scp</tt> or <tt>sftp</tt>. For example, if you want to upload file <tt>foo.f</tt> to cluster orca from your machine <tt>myhost</tt>, use the following command
+
To transfer files to and from a cluster on a UNIX machine, you may use <tt>scp</tt> or <tt>sftp</tt>. For example, if you want to upload file <tt>foo.f</tt> to cluster graham from your machine <tt>myhost</tt>, use the following command
  
  myhost$ scp foo.f orca.sharcnet.ca:
+
  myhost$ scp foo.f graham.computecanada.ca:
  
 
assuming that your machine has <tt>scp</tt> installed. If you want to transfer a file from Windows or Mac, you need have <tt>scp</tt> or <tt>sftp</tt> for Windows or Mac installed.
 
assuming that your machine has <tt>scp</tt> installed. If you want to transfer a file from Windows or Mac, you need have <tt>scp</tt> or <tt>sftp</tt> for Windows or Mac installed.
  
If you transfer file <tt>foo.f</tt> between SHARCNET clusters, say from your home directory on orca to your scratch directory on requin, simply use the following command
+
If you transfer file <tt>foo.f</tt> between SHARCNET clusters, say from your home directory on orca to your scratch directory on graham, simply use the following command
  
  [username@orc-login2:~]$ scp foo.f requin:/scratch/username/
+
  [username@orc-login2:~]$ scp foo.f graham:/home/username/
  
If you are transferring files between a UNIX machine and a cluster, you may use <tt>scp</tt> command with <tt>-r</tt> option. For instance, if you want to download the subdirectory <tt>foo</tt> in the directory <tt>project</tt> in your home directory on saw to your local UNIX machine, on your local machine, use command
+
If you are transferring files between a UNIX machine and a cluster, you may use <tt>scp</tt> command with <tt>-r</tt> option. For instance, if you want to download the subdirectory <tt>foo</tt> in the directory <tt>project</tt> in your home directory on graham to your local UNIX machine, on your local machine, use command
  
  myhost$ scp -rp saw.sharcnet.ca:project/foo .
+
  myhost$ scp -rp graham.sharcnet.ca:project/foo .
  
 
Similarly, you can transfer the subdirectory between SHARCNET clusters. The following command
 
Similarly, you can transfer the subdirectory between SHARCNET clusters. The following command
  
  [username@orc-login2:~]$ scp -rp requin:/scratch/username/foo .
+
  [username@orc-login2:~]$ scp -rp graham:/home/username/scratch/foo .
  
will download subdirectory <tt>foo</tt> from your scratch directory on requin to your home directory on orca (note that the prompt indicates you are currently logged on to orca).
+
will download subdirectory <tt>foo</tt> from your scratch directory on graham to your home directory on orca (note that the prompt indicates you are currently logged on to orca).
  
 
The use of <tt>-p</tt> option above will preserve the time stamp of each file. For Windows and Mac, you need to check the documentation of <tt>scp</tt> for features.
 
The use of <tt>-p</tt> option above will preserve the time stamp of each file. For Windows and Mac, you need to check the documentation of <tt>scp</tt> for features.
  
You may also <tt>tar</tt> and compress the entire directory and then use <tt>scp</tt> to save bandwidth. In the above example, first you login to orca, then do the following
+
You may also <tt>tar</tt> and compress the entire directory and then use <tt>scp</tt> to save bandwidth. In the above example, first you login to graham, then do the following
  
  [username@orc-login2:~]$ cd project
+
  [username@gra-login2:~]$ cd project
  [username@orc-login2:~]$ tar -cvf foo.tar foo
+
  [username@gra-login2:~]$ tar -cvf foo.tar foo
  [username@orc-login2:~]$ gzip foo.tar
+
  [username@gra-login2:~]$ gzip foo.tar
  
 
Then on your local machine myhost, use <tt>scp</tt> to copy the tar file
 
Then on your local machine myhost, use <tt>scp</tt> to copy the tar file
  
  myhost$ scp orca.sharcnet.ca:project/foo.tar.gz .
+
  myhost$ scp graham.computecanada.ca:project/foo.tar.gz .
  
 
Note for most Linux distributions, <tt>tar</tt> has an option <tt>-z</tt> that will compress the <tt>.tar</tt> file using <tt>gzip</tt>.
 
Note for most Linux distributions, <tt>tar</tt> has an option <tt>-z</tt> that will compress the <tt>.tar</tt> file using <tt>gzip</tt>.
Line 219: Line 136:
 
====How can I best transfer large quantities of data to/from SHARCNET and what transfer rate should I expect?====
 
====How can I best transfer large quantities of data to/from SHARCNET and what transfer rate should I expect?====
  
In general, most users should be fine using ''scp'' or ''rsync'' to transfer data to and from SHARCNET systems.  If you need to transfer a lot of files ''rsync'' is recommended to ensure that you do not need to restart the transfer from scratch should there be a connection failure.  Although you can use ''scp'' and ''rsync'' to any cluster's login node(s), it is often best to use dtn.sharcnet.ca - it is dedicated to data transfer.  
+
In general, most users should be fine using ''scp'' or ''rsync'' to transfer data to and from SHARCNET systems.  If you need to transfer a lot of files ''rsync'' is recommended to ensure that you do not need to restart the transfer from scratch should there be a connection failure.  Although you can use ''scp'' and ''rsync'' to any cluster's login node(s), it is often best to use gra-dtn1.computecanada.ca - it is dedicated to data transfer.  
  
 
In general one should expect the following transfer rates with ''scp'':
 
In general one should expect the following transfer rates with ''scp'':
Line 228: Line 145:
 
Keep in mind that filesystems and networks are  shared resources and suffer from contention; if they are busy the above rates may not be attainable
 
Keep in mind that filesystems and networks are  shared resources and suffer from contention; if they are busy the above rates may not be attainable
  
If you need to transfer a large quantity of data to SHARCNET and are finding your transfer rate to be slow please contact [mailto:help@sharcnet.ca help@sharcnet.ca] to request assistance.  We can provide additional tips and tools to greatly improve data transfer rates, especially to systems/users outside of Ontario's regional ORION network. For example, we've observed speed-ups from <1 MB/s using scp to well over 10 MB/s between Compute Canada systems connected via CANARIE by using specialized data-transfer programs (eg. bbcp).
+
For transferring large amounts of data (many gigabytes) the best approach is to use the online tool [https://www.sharcnet.ca/help/index.php/Globus Globus].
  
 
==== How do I access the same file from different subdirectories on the same cluster ? ====
 
==== How do I access the same file from different subdirectories on the same cluster ? ====
Line 235: Line 152:
 
or using the same file in different subdirectories).  Instead of using
 
or using the same file in different subdirectories).  Instead of using
 
scp you might consider issuing a "soft link" command. Assume that you need access
 
scp you might consider issuing a "soft link" command. Assume that you need access
to the file large_file1 in subdirectory /work/user1/subdir1 and you need it to be in your
+
to the file large_file1 in subdirectory /home/user1/subdir1 and you need it to be in your
subdirectory /work/my_account/my_dir from where you will invoke it under the name
+
subdirectory /home/my_account/my_dir from where you will invoke it under the name
 
my_large_file1.  Then go to that directory and type:
 
my_large_file1.  Then go to that directory and type:
  
  ln -s /work/user1/subdir1/large_file1    my_large_file1
+
  ln -s /home/user1/subdir1/large_file1    my_large_file1
  
Another example, assume that in subdirectory /work/my_account/PROJ1 you have several
+
Another example, assume that in subdirectory /home/my_account/PROJ1 you have several
 
subdirectories called CASE1, CASE2, ...  In each subdirectory CASEn you have a slightly
 
subdirectories called CASE1, CASE2, ...  In each subdirectory CASEn you have a slightly
 
different code but all of them process the same data file called test_data. Rather than
 
different code but all of them process the same data file called test_data. Rather than
 
copying the test_data file into each CASEn subdirectory, place test_data above i.e.
 
copying the test_data file into each CASEn subdirectory, place test_data above i.e.
in /work/my_account/PROJ1 and then in each CASEn subdirectory issue following "soft link"
+
in /home/my_account/PROJ1 and then in each CASEn subdirectory issue following "soft link"
 
command:
 
command:
  
Line 251: Line 168:
  
 
The "soft links" can be removed by using the rm command. For example, to remove the soft link from
 
The "soft links" can be removed by using the rm command. For example, to remove the soft link from
/work/my_account/PROJ1/CASE2 type following command from this subdirectory:
+
/home/my_account/PROJ1/CASE2 type following command from this subdirectory:
  
 
  rm -rf test_data
 
  rm -rf test_data
  
Typing above command from subdirectory work/my_account/PROJ1 would remove the actual file and then
+
Typing above command from subdirectory /home/my_account/PROJ1 would remove the actual file and then
 
none of the CASEn subdirectories would have access to it.
 
none of the CASEn subdirectories would have access to it.
  
==== How are files deleted from the /scratch filesystems? ====
+
==== How are files deleted from the /home/userid/scratch filesystems? ====
  
All files on /scratch that are over 2 months old (not old in the common sense, please see below) are automatically deleted. You will be sent an email notification beforehand warning you of any filesystems (not the actual files, however) where you may have files scheduled for deletion in the immediate future.  
+
All files on /home/userid/scratch that are over 2 months old (not old in the common sense, please see below) are automatically deleted. Data needed for long term storage and reference should be kept in either ~/project or other archival storage areas. The scratch filesystem is checked at the end of the month for files which will be candidates for expiry on the 15th of the following month. On the first day of the month, a login message is posted and a notification e-mail is sent to all users who have at least one file which is a candidate for purging and containing the location of a file which lists all the candidates for purging.  
  
An unconventional aspect of this system is that it does not determine the age of a file based on the file's attributes, e.g., the dates reported by the <i>stat</i>, <i>find</i>, <i>ls</i>, etc. commands.  The age of a file is determined based on whether or not its data contents (i.e., the information stored in the file) have changed, and this age is stored externally to the file.  Once a file is created in /scratch/<userid> , reading it, renaming, changing the file's timestamps with the <i>touch</i> command, or copying it into another file are all irrelevant in terms of changing its age with respect to the purging system. The file will be expired 2 months after it was created.  Only files where the contents have changed will have their age counter "reset".
+
An unconventional aspect of this system is that it does not determine the age of a file based on the file's attributes, e.g., the dates reported by the <i>stat</i>, <i>find</i>, <i>ls</i>, etc. commands.  The age of a file is determined based on whether or not its data contents (i.e., the information stored in the file) have changed, and this age is stored externally to the file.  Once a file is created , reading it, renaming, changing the file's timestamps with the <i>touch</i> command, or copying it into another file are all irrelevant in terms of changing its age with respect to the purging system. The file will be expired 2 months after it was created.  Only files where the contents have changed will have their age counter "reset".
  
 
Unfortunately, there currently exists no method to obtain a listing of the files that are scheduled for deletion.  This is something that is being addressed, however there is no estimated time for implementation.
 
Unfortunately, there currently exists no method to obtain a listing of the files that are scheduled for deletion.  This is something that is being addressed, however there is no estimated time for implementation.
  
If you have data in /scratch that needs to persist (eg. configuration files, important simulation output) we recommend you stage it from /gwork or /freezer it as appropriate.
+
===== How do I check the age of a file =====
 +
We define a file's age as the most recent of:
 +
 
 +
*the access time (atime) and
 +
*the change time (ctime)
 +
 
 +
You can find the ctime of a file using
 +
 
 +
[name@server ~]$ ls -lc <filename>
 +
 
 +
while the atime can be obtained with the command
 +
 
 +
[name@server ~]$ ls -lu <filename>
 +
 
 +
We do not use the modify time (mtime) of the file because it can be modified by the user or by other programs to display incorrect information.
 +
 
 +
Ordinarily, simple use of the atime property would be sufficient, as it is updated by the system in sync with the ctime. However, userspace programs are able to alter atime, potentially to times in the past, which could result in early expiration of a file. The use of ctime as a fallback guards against this undesirable behaviour.
 +
 
 +
It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using Globus
  
 
==== How to archive my data? ====
 
==== How to archive my data? ====
  
Presently SHARCNET provides the /freezer filesystem as a regularly accessible filesystem on the login nodes of our clusters (not the compute nodes!). To back up data which you'd like too keep, but don't expect to access in the foreseeable future, or to just keep a backup of data from global work or local scratch filesystems, one may use regular commands (cp, mv, rm, rsync, tar etc.), eg.
+
===== Use tar to archive files and directories =====
 +
 
 +
The primary archiving utility on all Linux and Unix-like systems is the [https://www.gnu.org/software/tar/manual/tar.html tar] command. It will bundle a bunch of files or directories together and generate a single file, called an ''archive file'' or ''tar-file''. By convention an archive file has <code>.tar</code> as the file name extension. When you archive a directory with <code>tar</code>, it will, by default, include all the files and sub-directories contained within it, and sub-sub-directories contained in those, and so on. So the command <code>tar --create --file project1.tar project1</code> will pack all the content of directory ''project1'' into the file ''project1.tar''. The original directory will remain unchanged, so this may double the amount of disk space occupied!
 +
 
 +
You can extract files from an archive using the same command with a different option:<code>tar --extract --file project1.tar</code>. If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten. Another option can be added to specify the destination directory where to extract the archive's content. 
 +
 
 +
===== Compress and uncompress tar files =====
 +
 
 +
The <code>tar</code> archiving utility can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either <code>xz</code> or <code>gzip</code>, which can be used as follows:
 +
 
 +
[user_name@localhost]$ tar --create --xz --file project1.tar.xz project1
 +
[user_name@localhost]$ tar --extract --xz --file project1.tar.xz
 +
[user_name@localhost]$ tar --create --gzip --file project1.tar.gz project1
 +
[user_name@localhost]$ tar --extract --gzip --file project1.tar.gz
 +
 
 +
Typically, <code>--xz</code> will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working. <code>--gzip</code> does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during <code>tar --create</code>. A third option, <code>--bzip2</code>, is also available, that typically does not compress as small as <code>xz</code> but takes longer than <code>gzip</code>.
 +
 
 +
You can also run <code>tar --create</code> first without compression and then use the commands <code>xz</code> or <code>gzip</code> in a separate step, although there is rarely a reason to do so. Similarly, you can run <code>xz -d</code> or <code>gzip -d</code> to decompress an archive file before running <code>tar --extract</code>, but again there is rarely a reason to do so.
 +
 
 +
The commands <code>gzip</code> or <code>xz</code> can be used to compress any file, not just archive files:
  
  cp /scratch/$USER/$SIMULATION /freezer/$USER/$SIMULATION
+
[user_name@localhost]$ gzip bigfile
 +
[user_name@localhost]$ xz bigfile
  
'''Be extremely careful when deleting your data from the Archive: there is no backup for the data!'''
+
These commands will produce the files <code>bigfile.gz</code> and <code>bigfile.xz</code> respectively.
  
'''Please note:''' unlike our old /archive file system, the new /freezer file system has both a size quota (2TB; going over the quota results in your submitted jobs not running - same as with /work), and an expiry: after 2 years your files will be deleted. See our [https://www.sharcnet.ca/my/systems/storage storage policies page] for details.
+
=====Archival Storage=====
 +
 
 +
On Graham, files copied to ~/nearline will be subsequently moved to offline (tape-based) storage. See [https://docs.computecanada.ca/wiki/Using_nearline_storage this link] for more details.
  
 
==== How can I check the hidden files in directory? ====
 
==== How can I check the hidden files in directory? ====
  
The "." at the beginning of the name means that the file is "hidden". You have to use the -a option with ls to see it. I.e. 'ls -a'.   
+
The "." at the beginning of the name means that the file is "hidden". You have to use the -a option with ls to see it. I.e. <code>ls -a </code>.   
  
 
If you want to display only the hidden files then type:
 
If you want to display only the hidden files then type:
 
  ls -d .*
 
  ls -d .*
  
Note: there is an alias which is loaded from /etc/bashrc (see your .bashrc file). The alias is defined by alias l.='ls -d .* --color=tty' and if you type:
+
Note: there is an alias which is loaded from /etc/bashrc (see your .bashrc file). The alias is defined by alias <code> l.='ls -d .* --color=auto' </code> and if you type:
 
  l.
 
  l.
 
you will also display only the hidden files.
 
you will also display only the hidden files.
Line 293: Line 250:
 
One can use the following command to count the number of files in a directory (in this example, your /work directory):
 
One can use the following command to count the number of files in a directory (in this example, your /work directory):
  
  find /work/$USER -type f  | wc -l
+
  find /home/$USER -type f  | wc -l
  
 
It is always a good idea to [[Knowledge_Base#How_to_archive_my_data.3F|archive]] and/or compress files that are no longer needed on the filesystem (see [https://www.sharcnet.ca/help/index.php/Knowledge_Base#How_to_organize_a_large_number_of_files.3F below]).  This helps minimize one's footprint on the filesystem and as such the impact they have on other users of the shared resource.
 
It is always a good idea to [[Knowledge_Base#How_to_archive_my_data.3F|archive]] and/or compress files that are no longer needed on the filesystem (see [https://www.sharcnet.ca/help/index.php/Knowledge_Base#How_to_organize_a_large_number_of_files.3F below]).  This helps minimize one's footprint on the filesystem and as such the impact they have on other users of the shared resource.
Line 319: Line 276:
 
The most likely cause of this behaviour is repeated failed login attempts.  Part of our security policies involves blocking the IP address of machines that attempt multiple logins with incorrect passwords over a short period of time---many brute-force attacks on systems do exactly this: looking for poor passwords, badly configured accounts, etc.  Unfortunately, it isn't uncommon for a user to forget their password and make repeated login attempts with incorrect passwords and end up with that machine blacklisted and unable to connect at all.
 
The most likely cause of this behaviour is repeated failed login attempts.  Part of our security policies involves blocking the IP address of machines that attempt multiple logins with incorrect passwords over a short period of time---many brute-force attacks on systems do exactly this: looking for poor passwords, badly configured accounts, etc.  Unfortunately, it isn't uncommon for a user to forget their password and make repeated login attempts with incorrect passwords and end up with that machine blacklisted and unable to connect at all.
  
A temporary solution is simply to attempt to login from another machine.  If you have access to another machine at your site, you can shell to that machine first, and then shell to the SHARCNET system (as that machine's IP shouldn't be blacklisted).  In order to have your machine unblocked, you will have to file a [https://www.sharcnet.ca/my/problems/submit problem ticket] as a system administrator must manually intervene in order to fix it.
+
A temporary solution is simply to attempt to login from another machine.  If you have access to another machine at your site, you can shell to that machine first, and then shell to the SHARCNET system (as that machine's IP shouldn't be blacklisted).  In order to have your machine unblocked, you will have to email to [mailto:support@computecanada.ca support@computecanada.ca] as a system administrator must manually intervene in order to fix it.
  
 
NOTE: there are other situations that can produce this message, however they are rarer and more transient.  If you are unable to log in from one machine, but can from another, it is most likely the IP blacklisting that is the problem and the above will provide a temporary work-around while your problem ticket is processed.
 
NOTE: there are other situations that can produce this message, however they are rarer and more transient.  If you are unable to log in from one machine, but can from another, it is most likely the IP blacklisting that is the problem and the above will provide a temporary work-around while your problem ticket is processed.
 
=== I can login successfully using WinSCP but I can't find my /work directory and files ===
 
 
Windows tools like winscp want to list the /work directory to let users click to traverse. this doesn't work because of automounting (and also makes no sense given that there would be >4000 entries.) to a commandline user, this is a non-issue, since anything which accesses /work/$user will instantiate the mount (ls -ld /work/$user/ for instance, or cd).
 
 
It is recommended to make a default link from /work/$user to /home/$user/work in WinSCP setting.
 
  
 
=== I am unable to ssh/scp from SHARCNET to my local computer ===
 
=== I am unable to ssh/scp from SHARCNET to my local computer ===
Line 387: Line 338:
 
If your desktop supports FUSE, it's very convenient to simply mount your home tree like this:
 
If your desktop supports FUSE, it's very convenient to simply mount your home tree like this:
 
  mkdir sharcnet
 
  mkdir sharcnet
  sshfs orca.sharcnet.ca: sharcnet
+
  sshfs graham.computecanada.ca: sharcnet
 
you can then use any local editor of your choice.
 
you can then use any local editor of your choice.
  
If you run emacs on your desktop, you can also edit a remote file from within your local emacs client using [http://www.gnu.org/software/tramp Tramp], opening and saving a file as /username@cluster.sharcnet.ca:path/file.
+
If you run emacs on your desktop, you can also edit a remote file from within your local emacs client using [http://www.gnu.org/software/tramp Tramp], opening and saving a file as /username@cluster.computecanada.ca:path/file.
 
+
<!-- this was dependent on owncloud, I think.
+
 
+
On a Mac, you can mount your files via:
+
 
+
Finder -> Go -> Connect to Server
+
 
+
then for server address specify:
+
 
+
https://files.sharcnet.ca/your_username
+
 
+
replacing ''your_username'' with your actual username.*/ -->
+

Latest revision as of 14:56, 8 February 2019


Contents

Logging in to Systems, Transferring and Editing Files

How do I login to SHARCNET?

You access the SHARCNET clusters using ssh. For Graham and other national systems Compute Canada credentials are required. For the remaining systems, listed here, you will require SHARCNET credentials.

Unix/Linux/OS X

To login to a system, you need to use an Secure Shell (SSH) connection. If you are logging in from a UNIX-based machine, make sure it has an SSH client (ssh) installed (this is almost always the case on UNIX/Linux/OS X). If you have the same login name on both your local system and SHARCNET, and you want to login to, say, graham, you may use the command:

ssh graham.computecanada.ca

If your Compute Canada username is different from the username on your local systems, then you may use either of the following forms:

ssh graham.computecanada.ca -l username
ssh username@graham.computecanada.ca

If you want to establish an X window connection so that you can use graphics applications such as gvim and xemacs, you can add a -Y to the command:

ssh -Y username@graham.computecanada.ca

This will automatically set the X DISPLAY variable when you login.

Windows

If you are logging from a computer running Windows and need some pointers we recommend consulting our SSH tutorial.

What is the difference between Login Nodes and Compute Nodes?

Login Nodes

Most of our clusters have distinct login nodes associated with them that you are automatically redirected to when you login to the cluster (some systems are directly logged into, eg. SMPs and smaller specialty systems). You can use these to do most of your work preparing for jobs (compiling, editing configuration files) and other low-intensity tasks like moving and copying files.

You can also use them for other quick tasks, like simple post-processing, but any significant work should be submitted as a job to the compute nodes. On most login nodes, each process is limited to 1 cpu-hour; this will be noticable if you perform anything compute-intensive, and can affect IO-oriented activity as well (such as very large scp or rsync operations.)

How can I suspend and resume my session?

The program screen can start persistent terminals from which you can detach and reattach. The simplest use of screen is

screen -dR

which will either reattach you to any existing session or create a new one if one doesn't exist. To terminate the current screen session, type exit. To detach manually (you are automatically detached if the connection is lost) press ctrl+a followed by d, you can the resume later as above (ideal for running background jobs). Note that ctrl+a is screen's escape sequence, so you have to do ctrl+a followed by a to get the regular effect of pressing ctrl+a inside a screen session (e.g., moving the cursor to the start of the line in a shell).

For a list of other ctrl+a key sequences, press ctrl+a followed by ?. For further details and command line options, see the screen manual (or type man screen on any of the clusters).

Other notes:

  • If you want to create additional "text windows", use Ctrl-A Ctrl-C. Remember to type "exit" to close it.
  • To switch to a "text window" with a certain number, use Ctrl-A # (where # is 0 to 9).
  • To see a list of window numbers use Ctrl-A w
  • To be presented a list of windows and select one to use, use Ctrl-A " (This is handy if you've made too many windows.)
  • If the program running in a screen "text window" refuses to die (i.e., it needs to be killed) you can use Ctrl-A K
  • For brief help on keystrokes use Ctrl-A ?
  • For extensive help, run "man screen".

What operating systems are supported?

UNIX in general. Currently, Linux is the only operating system used within SHARCNET.

What makes a cluster different than my UNIX workstation?

If you are familiar with UNIX, then using a cluster is not much different from using a workstation. When you login to a cluster, you in fact only log in to one of the cluster nodes. In most cases, each cluster node is a physical machine, usually a server class machine, with one or several CPUs, that is more or less the same as a workstation you are familiar with. The difference is that these nodes are interconnected with special interconnect devices and the way you run your program is slightly different. Across SHARCNET clusters, you are not expected to run your program interactively. You will have to run your program through a queueing system. That also means where and when your program gets to run is not decided by you, but by the queueing system.

What programming languages are supported?

Those primary programming languages such as C, C++ and Fortran are supported. Other languages, such as Java, Pascal and Ada, are also supported, but with limited technical support from us. If your program is written in any language other than C, C++ and Fortran, and you encounter a problem, we may or may not be able solve it within a short period of time. Note: this does not mean you can't use other languages like Matlab, R, Python, Perl, etc. We normally think of those as "scripting" languages, but that doesn't imply that good HPC necessarily requires an explicitly-compiled language like Fortran.

How do I organize my files?

Main file systems on our National systems:

Filesystemgraham.png

How are file permissions handled at SHARCNET?

By default, anyone in your group can read and access your files. You can provide access to any other users by following this Knowledge Base entry.

All SHARCNET users are associated with a primary GID (group id) belonging to the PI of the group (you can see this by running id username , with your username). This allows for groups to share files without any further action, as the default file permissions for all SHARCNET storage locations (Eg. /gwork/user ) allows read (list) and execute (enter / access) permissions for the group, eg. they appear as:

  [cc_user@gra-login2 ~]$ ls -ld scratch/
   drwxrwx---+ 12 cc_user cc_user 4096 Jul 18 08:59 scratch/


Further, by default the umask value for all users is 0002, so any new files or directories will continue to provide access to the group.

Should you wish to keep your files private from all other users, you should set the permissions on the base directory to only be accessible to yourself. For example, if you don't want anyone to see files in your home directory, you'd run:

chmod 700 ~/

If you want to ensure that any new files or directories are created with different permissions, you can set your umask value. See the man page for further details by running:

man umask

For further information on UNIX-based file permissions please run:

man chmod

What about really large files or if I get the error 'No space left on device' in ~/project or ~/scratch?

If you need to work with really large files we have tips on optimizing performance with our parallel filesystems here.

How do I transfer files/directories to/from or between cluster?

Unix/Linux

To transfer files to and from a cluster on a UNIX machine, you may use scp or sftp. For example, if you want to upload file foo.f to cluster graham from your machine myhost, use the following command

myhost$ scp foo.f graham.computecanada.ca:

assuming that your machine has scp installed. If you want to transfer a file from Windows or Mac, you need have scp or sftp for Windows or Mac installed.

If you transfer file foo.f between SHARCNET clusters, say from your home directory on orca to your scratch directory on graham, simply use the following command

[username@orc-login2:~]$ scp foo.f graham:/home/username/

If you are transferring files between a UNIX machine and a cluster, you may use scp command with -r option. For instance, if you want to download the subdirectory foo in the directory project in your home directory on graham to your local UNIX machine, on your local machine, use command

myhost$ scp -rp graham.sharcnet.ca:project/foo .

Similarly, you can transfer the subdirectory between SHARCNET clusters. The following command

[username@orc-login2:~]$ scp -rp graham:/home/username/scratch/foo .

will download subdirectory foo from your scratch directory on graham to your home directory on orca (note that the prompt indicates you are currently logged on to orca).

The use of -p option above will preserve the time stamp of each file. For Windows and Mac, you need to check the documentation of scp for features.

You may also tar and compress the entire directory and then use scp to save bandwidth. In the above example, first you login to graham, then do the following

[username@gra-login2:~]$ cd project
[username@gra-login2:~]$ tar -cvf foo.tar foo
[username@gra-login2:~]$ gzip foo.tar

Then on your local machine myhost, use scp to copy the tar file

myhost$ scp graham.computecanada.ca:project/foo.tar.gz .

Note for most Linux distributions, tar has an option -z that will compress the .tar file using gzip.

Windows

You may read the instruction using ssh client. [[1]]

How can I best transfer large quantities of data to/from SHARCNET and what transfer rate should I expect?

In general, most users should be fine using scp or rsync to transfer data to and from SHARCNET systems. If you need to transfer a lot of files rsync is recommended to ensure that you do not need to restart the transfer from scratch should there be a connection failure. Although you can use scp and rsync to any cluster's login node(s), it is often best to use gra-dtn1.computecanada.ca - it is dedicated to data transfer.

In general one should expect the following transfer rates with scp:

  • If you are connecting to SHARCNET through a Research/Education network site (ORION, CANARIE, Internet2) and are on a fast local network (this is the case for most users connecting from academic institutions) then you should be able to attain sustained transfer speeds in excess of 10MB/s. If your path is all gigabit or better, you should be able to reach rates above 50 MB/s.
  • If you are transferring data over the wider internet, you will not be able to attain these speeds, as all traffic that does not enter/exit SHARCNET via the R&E net is restricted to a limited-bandwidth commercial feed. In this case one will typically see rates on the order of 1MB/s or less.

Keep in mind that filesystems and networks are shared resources and suffer from contention; if they are busy the above rates may not be attainable

For transferring large amounts of data (many gigabytes) the best approach is to use the online tool Globus.

How do I access the same file from different subdirectories on the same cluster ?

You should not need copy large files on the same cluster (e.g. from one user to another or using the same file in different subdirectories). Instead of using scp you might consider issuing a "soft link" command. Assume that you need access to the file large_file1 in subdirectory /home/user1/subdir1 and you need it to be in your subdirectory /home/my_account/my_dir from where you will invoke it under the name my_large_file1. Then go to that directory and type:

ln -s /home/user1/subdir1/large_file1    my_large_file1

Another example, assume that in subdirectory /home/my_account/PROJ1 you have several subdirectories called CASE1, CASE2, ... In each subdirectory CASEn you have a slightly different code but all of them process the same data file called test_data. Rather than copying the test_data file into each CASEn subdirectory, place test_data above i.e. in /home/my_account/PROJ1 and then in each CASEn subdirectory issue following "soft link" command:

ln -s ../test_data  test_data

The "soft links" can be removed by using the rm command. For example, to remove the soft link from /home/my_account/PROJ1/CASE2 type following command from this subdirectory:

rm -rf test_data

Typing above command from subdirectory /home/my_account/PROJ1 would remove the actual file and then none of the CASEn subdirectories would have access to it.

How are files deleted from the /home/userid/scratch filesystems?

All files on /home/userid/scratch that are over 2 months old (not old in the common sense, please see below) are automatically deleted. Data needed for long term storage and reference should be kept in either ~/project or other archival storage areas. The scratch filesystem is checked at the end of the month for files which will be candidates for expiry on the 15th of the following month. On the first day of the month, a login message is posted and a notification e-mail is sent to all users who have at least one file which is a candidate for purging and containing the location of a file which lists all the candidates for purging.

An unconventional aspect of this system is that it does not determine the age of a file based on the file's attributes, e.g., the dates reported by the stat, find, ls, etc. commands. The age of a file is determined based on whether or not its data contents (i.e., the information stored in the file) have changed, and this age is stored externally to the file. Once a file is created , reading it, renaming, changing the file's timestamps with the touch command, or copying it into another file are all irrelevant in terms of changing its age with respect to the purging system. The file will be expired 2 months after it was created. Only files where the contents have changed will have their age counter "reset".

Unfortunately, there currently exists no method to obtain a listing of the files that are scheduled for deletion. This is something that is being addressed, however there is no estimated time for implementation.

How do I check the age of a file

We define a file's age as the most recent of:

*the access time (atime) and
*the change time (ctime)

You can find the ctime of a file using

[name@server ~]$ ls -lc <filename>

while the atime can be obtained with the command

[name@server ~]$ ls -lu <filename>

We do not use the modify time (mtime) of the file because it can be modified by the user or by other programs to display incorrect information.

Ordinarily, simple use of the atime property would be sufficient, as it is updated by the system in sync with the ctime. However, userspace programs are able to alter atime, potentially to times in the past, which could result in early expiration of a file. The use of ctime as a fallback guards against this undesirable behaviour.

It is also your responsibility to manage the age of your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using Globus

How to archive my data?

Use tar to archive files and directories

The primary archiving utility on all Linux and Unix-like systems is the tar command. It will bundle a bunch of files or directories together and generate a single file, called an archive file or tar-file. By convention an archive file has .tar as the file name extension. When you archive a directory with tar, it will, by default, include all the files and sub-directories contained within it, and sub-sub-directories contained in those, and so on. So the command tar --create --file project1.tar project1 will pack all the content of directory project1 into the file project1.tar. The original directory will remain unchanged, so this may double the amount of disk space occupied!

You can extract files from an archive using the same command with a different option:tar --extract --file project1.tar. If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten. Another option can be added to specify the destination directory where to extract the archive's content.

Compress and uncompress tar files

The tar archiving utility can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either xz or gzip, which can be used as follows:

[user_name@localhost]$ tar --create --xz --file project1.tar.xz project1
[user_name@localhost]$ tar --extract --xz --file project1.tar.xz
[user_name@localhost]$ tar --create --gzip --file project1.tar.gz project1
[user_name@localhost]$ tar --extract --gzip --file project1.tar.gz

Typically, --xz will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working. --gzip does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during tar --create. A third option, --bzip2, is also available, that typically does not compress as small as xz but takes longer than gzip.

You can also run tar --create first without compression and then use the commands xz or gzip in a separate step, although there is rarely a reason to do so. Similarly, you can run xz -d or gzip -d to decompress an archive file before running tar --extract, but again there is rarely a reason to do so.

The commands gzip or xz can be used to compress any file, not just archive files:

[user_name@localhost]$ gzip bigfile
[user_name@localhost]$ xz bigfile

These commands will produce the files bigfile.gz and bigfile.xz respectively.

Archival Storage

On Graham, files copied to ~/nearline will be subsequently moved to offline (tape-based) storage. See this link for more details.

How can I check the hidden files in directory?

The "." at the beginning of the name means that the file is "hidden". You have to use the -a option with ls to see it. I.e. ls -a .

If you want to display only the hidden files then type:

ls -d .*

Note: there is an alias which is loaded from /etc/bashrc (see your .bashrc file). The alias is defined by alias l.='ls -d .* --color=auto' and if you type:

l.

you will also display only the hidden files.

How can I count the number of files in a directory?

One can use the following command to count the number of files in a directory (in this example, your /work directory):

find /home/$USER -type f   | wc -l

It is always a good idea to archive and/or compress files that are no longer needed on the filesystem (see below). This helps minimize one's footprint on the filesystem and as such the impact they have on other users of the shared resource.

How to organize a large number of files?

With parallel cluster filesystems, you will get best I/O performance writing data to a small number of large files. Since all metadata operations on each of our parallel filesystems are handled by a single file server, depending on how many files are being accessed the server can become overwhelmed leading to poor overall I/O performance for all users. If your workflow involves storing data in a large number of files, it is best to pack these files into a small number of larger archives, e.g. using tar command

tar cvf archiveFile.tar directoryToArchive

For better performance with many files inside your archive, we recommend to use DAR (Disk ARchive utility), which is a disk analog of tar (Tape ARchive). Dar can extract files from anywhere in the archive much faster than tar. The dar command is available by default on sharcnet systems. It can be used to pack files into a dar archive by doing something like:

dar -s 1G -w -c archiveFile -g directoryToArchive

In this example we split the archive into 1GB chunks, and the archive files will be named archiveFile.1.dar, archiveFile.2.dar, and so on. To list the contents of the archive, you can type:

dar -l archiveFile

To temporarily extract files for post-processing into current directory, you would type:

dar -R . -O -x archiveFile -v -g pathToYourFile/fileToExtract

I am unable to connect to one of the clusters; when I try, I am told the connection was closed by the remote host

The most likely cause of this behaviour is repeated failed login attempts. Part of our security policies involves blocking the IP address of machines that attempt multiple logins with incorrect passwords over a short period of time---many brute-force attacks on systems do exactly this: looking for poor passwords, badly configured accounts, etc. Unfortunately, it isn't uncommon for a user to forget their password and make repeated login attempts with incorrect passwords and end up with that machine blacklisted and unable to connect at all.

A temporary solution is simply to attempt to login from another machine. If you have access to another machine at your site, you can shell to that machine first, and then shell to the SHARCNET system (as that machine's IP shouldn't be blacklisted). In order to have your machine unblocked, you will have to email to support@computecanada.ca as a system administrator must manually intervene in order to fix it.

NOTE: there are other situations that can produce this message, however they are rarer and more transient. If you are unable to log in from one machine, but can from another, it is most likely the IP blacklisting that is the problem and the above will provide a temporary work-around while your problem ticket is processed.

I am unable to ssh/scp from SHARCNET to my local computer

Most campus networks are behind some sort of firewall. If you can ssh out to SHARCNET, but cannot establish a connection in the other direction, then you are probably behind a firewall and should speak with your local system administrator or campus IT department to determine if there are any exceptions or workarounds in place.

SSH tells me SOMEONE IS DOING SOMETHING NASTY!?

Suppose you attempt to login to SHARCNET, but instead get an alarming message like this:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
fe:65:ab:89:9a:23:34:5a:50:1e:05:d6:bf:ec:da:67.
Please contact your system administrator.
Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
Offending key in /home/user/.ssh/known_hosts:42
RSA host key for requin has changed and you have requested strict checking.
Host key verification failed. 

SSH begins a connection by verifying that the host you're connecting to is authentic. It does this by caching the hosts's "hostkey" in your ~/.ssh/known_hosts file. At times, a hostkey may be changed legitimately; when this happens, you may see such a message. It's a good idea to verify this with us, you may be able to check the fingerprint yourself by logging into another sharcnet system and running:

ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key.pub 

If the fingerprint is OK, the normal way to fix the problem is to simply remove the old hostkey from your known_hosts file. You can use your choice of editor if you're comfortable doing so (it's a plain text file, but has long lines). On a unix-compatible machine, you can also use the following very small script (Substitute the line(s) printed in the warning message illustrated above for '42' here.):

perl -pi -e 'undef $_ if (++$line == 42)' ~/.ssh/known_hosts

Another solution is brute-force: remove the whole known_hosts file. This throws away any authentication checking, and your first subsequent connection to any machine will prompt you to accept a newly discovered host key. If you find this prompt annoying and you aren't concerned about security, you can avoid it by adding a text file named ~/.ssh/config on your machine with the following content:

StrictHostKeyChecking no

Ssh works, but scp doesn't!

If you can ssh to a cluster successfully, but cannot scp to to it, the problem is likely that your login scripts print unexpected messages which confuse scp. scp is based on the same ssh protocol, but assumes that the connection is "clean": that is, that it does not produce any un-asked-for content. If you have something like:

echo "Hello, Master; I await your command..."

scp will be confused by the salutation. To avoid this, simply ensure that the message is only printed on an interactive login:

if [ -t 0 ]; then
    echo "Hello, Master; I await your command..."
fi

or in csh/tcsh syntax:

if ( -t 0 ) then
    echo "Hello, Master; I await your command..."
endif

How do I edit my program on a cluster?

We provide a variety of editors, such as the traditional text-mode emacs and vi (vim), as well as a simpler one called nano. If you have X on your desktop (and tunneled through SSH), you can use the GUI versions (xemacs, gvim).

If your desktop supports FUSE, it's very convenient to simply mount your home tree like this:

mkdir sharcnet
sshfs graham.computecanada.ca: sharcnet

you can then use any local editor of your choice.

If you run emacs on your desktop, you can also edit a remote file from within your local emacs client using Tramp, opening and saving a file as /username@cluster.computecanada.ca:path/file.