|Target usage: General purpose|
|System information: see goblin system page in web portal|
|System status: see goblin status page|
|Real time system data: see Ganglia monitoring page|
|Full list of SHARCNET systems|
|CPU Model||AMD Opteron 2.4 GHz||Intel Xeon 2.53 GHz||Intel Xeon 2.27 GHz||AMD Opteron 2.6 GHz||Intel Xeon 2.3 GHz||Intel Xeon 2.2 GHz||Intel Xeon 2.0 GHz|
|Memory/node||8 GB||12 GB||48 GB||64 GB||32 GB||1024 GB||256 GB|
|Remark||dedicated to Peter Rogan (uwo)||contributed by Lance Lochner (uwo)||2 Intel Phi 5100 series accelerators||contributed by Lucian Ilie (uwo)||contributed by Lucian Ilie (uwo)|
|/scratch Storage||No longer available|
For system notice/history, please visit the Goblin system page in the SHARCNET web portal.
goblin.sharcnet.ca is a contributed gigabit ethernet cluster. Note that contributor jobs may suspend regular user jobs at any time, delaying them for up to 7 days. If your software uses licenses you should check that your jobs are not being suspended and tying up licenses unduly.
System Access and User Environment
[isaac@gb241:~] ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 62940 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) 128000 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) 1800 max user processes (-u) 100 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [isaac@gb241:~]
While on any SHARCNET system, one can use the g-tools util command glssys to see node details. For example
module load util glssys -s goblin -l
The system is suitable for general purpose.
Notes on I/O
The /scratch on goblin is no longer available. Users are advised to consider the following:
1) Run jobs on a cluster with scratch, i.e. orca, saw, if your program has frequent disk I/O.
2) Write checkpoint files to global work. In the case of goblin, global work is network based but is designed for large files like checkpoints and should handle the IO profile better.
4) Write the checkpoint files to /tmp, and delete them when no longer needed. Local disks can handle the IO from a smaller set of concurrent jobs and the checkpoint can be recovered from the node if needed (it it persistent across node crashes).
3) Increase the minimum checkpoint frequency to 12+hours.
4) Run less concurrent jobs.
The Phi Co-processors
Node 49 of goblin has two Intel Phi 5100 series coprocessors, each present as a system card (also known as MIC cards for Intel Many Integrated Core architecture) on the main board. Each one has 60 cores running in 4 threads per core, totaling 240 threads.
An Intel Phi coprocessor is essentially an X86 SMP on a chip, hence it's trivial to port existing code to it. Using OpenMP is perhaps the easiest way of accelerating the performance by running the existing code in threads in parallel.
Access to the MIC cards
Access to Phi processors is via SSH key access only and requires that the administrators add your key to the systems. You must email email@example.com to request access.
To use the Phi processors, first, ssh to goblin with SSH key forwarding, e.g. from a Unix like environment
ssh goblin.sharcnet.ca -A
then ssh to gb49 with command
ssh -A gb49
Then ssh to one of the MIC cards--mic0 and mic1--with ssh command, e.g.
You will see a linux subsystem, with your home directory and global work (/global/a/work/you or /global/b/work/you) mounted.
Please note that the Linux environment on the accelerator only offers Bourne Shell (sh) but not bash, so some of the familiar shell commands may not work.
Troubleshooting SSH access to MIC cards
For access to work, SSH key forwarding has to be enabled.
On a Linux system, you can do this by editing:
Host * ForwardAgent yes
On a Mac, these lines can be added to file /etc/ssh_config
If you are still not successful after this, execute "ssh-add" and try again.
Compiling PHI accelerated programs
One must use the most recent versions of the Intel compilers and MKL libraries, and all programs must be compiled on gb49, the PHI host, where the Intel® Manycore Platform Software Stack (MPSS) is installed.
ssh -A goblin.sharcnet.ca ssh -A gb49
Now one must source the Intel compiler by hand, since the module environment is not available on gb49:
source /opt/sharcnet/intel/14.0.0/icc/bin/compilervars.sh intel64
Additionally, one must specify the Intel compiler license file path:
If one is using MKL, then it must also be loaded by hand:
source /opt/sharcnet/mkl/11.1/bin/compilervars.sh intel64
At this point one should be able to compile PHI code to run natively, or using the PHI as a coprocessor. For example, to compile an MKL program for native execution on the PHI one would execute:
icpc -mmic source.cpp -openmp -mkl
The resulting binary can then be run on the PHI.
Contributors have preferential access to the resources. Contributors' jobs may suspend regular user jobs at any time, delaying them for up to 7 days (suspended jobs will be in state "Z"). Because of this, test jobs on the system are not guaranteed to start with 60 seconds. Users should run test jobs else where applicable if necessary.
In general users will experience the best performance on saw by ensuring that their jobs use whole nodes. Some measurements have shown that when MPI jobs are sharing nodes with other jobs that they slow down depending on resource contention.
This means that in general MPI jobs should use multiples of 8 cores. Threaded jobs can run up to 8 cores as saw has all 8 cores Xeon nodes.
When submitting MPI jobs, one should use the -N and -n flags to ensure a job is schedule to full nodes. For example, if your program is going to use 64 processes, one would submit it as:
sqsub -q mpi -n 64 -N 8 <...>
It is important to include -N 8 to ensure the job is not scattered on nodes where other user's jobs are running.