From Documentation
Jump to: navigation, search
goblin
Hostname: goblin.sharcnet.ca
Target usage: General purpose
System information: see goblin system page in web portal
System status: see goblin status page
Real time system data: see Ganglia monitoring page
Full list of SHARCNET systems


System Overview

Node Range 1-15 16-20 21-36 37-48 49 50 51-54
CPU Model AMD Opteron 2.4 GHz Intel Xeon 2.53 GHz Intel Xeon 2.27 GHz AMD Opteron 2.6 GHz Intel Xeon 2.3 GHz Intel Xeon 2.2 GHz Intel Xeon 2.0 GHz
Cores/node 4 16 8 24 12 32 12
Memory/node 8 GB 12 GB 48 GB 64 GB 32 GB 1024 GB 256 GB
Remark dedicated to Peter Rogan (uwo) contributed by Lance Lochner (uwo) 2 Intel Phi 5100 series accelerators contributed by Lucian Ilie (uwo) contributed by Lucian Ilie (uwo)
Interconnect Gigabit Ethernet
/scratch Storage No longer available
OS CentOS 6.3

For system notice/history, please visit the Goblin system page in the SHARCNET web portal.

goblin.sharcnet.ca is a contributed gigabit ethernet cluster. Note that contributor jobs may suspend regular user jobs at any time, delaying them for up to 7 days. If your software uses licenses you should check that your jobs are not being suspended and tying up licenses unduly.

System Access and User Environment

Login Nodes

[isaac@gb241:~] ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 62940
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 128000
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) 1800
max user processes              (-u) 100
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[isaac@gb241:~] 

While on any SHARCNET system, one can use the g-tools util command glssys to see node details. For example

module load util
glssys -s goblin -l

The system is suitable for general purpose.

Notes on I/O

The /scratch on goblin is no longer available. Users are advised to consider the following:

1) Run jobs on a cluster with scratch, i.e. orca, saw, if your program has frequent disk I/O.

2) Write checkpoint files to global work. In the case of goblin, global work is network based but is designed for large files like checkpoints and should handle the IO profile better.

4) Write the checkpoint files to /tmp, and delete them when no longer needed. Local disks can handle the IO from a smaller set of concurrent jobs and the checkpoint can be recovered from the node if needed (it it persistent across node crashes).

3) Increase the minimum checkpoint frequency to 12+hours.

4) Run less concurrent jobs.

The Phi Co-processors

Node 49 of goblin has two Intel Phi 5100 series coprocessors, each present as a system card (also known as MIC cards for Intel Many Integrated Core architecture) on the main board. Each one has 60 cores running in 4 threads per core, totaling 240 threads.

An Intel Phi coprocessor is essentially an X86 SMP on a chip, hence it's trivial to port existing code to it. Using OpenMP is perhaps the easiest way of accelerating the performance by running the existing code in threads in parallel.

Access to the MIC cards

Access to Phi processors is via SSH key access only and requires that the administrators add your key to the systems. You must email help@sharcnet.ca to request access.

To use the Phi processors, first, ssh to goblin with SSH key forwarding, e.g. from a Unix like environment

ssh goblin.sharcnet.ca -A

then ssh to gb49 with command

ssh -A gb49

Then ssh to one of the MIC cards--mic0 and mic1--with ssh command, e.g.

ssh mic0

You will see a linux subsystem, with your home directory and global work (/global/a/work/you or /global/b/work/you) mounted.

Please note that the Linux environment on the accelerator only offers Bourne Shell (sh) but not bash, so some of the familiar shell commands may not work.

Troubleshooting SSH access to MIC cards

For access to work, SSH key forwarding has to be enabled.

On a Linux system, you can do this by editing:

$HOME/.ssh/config

and adding

Host *
ForwardAgent yes

On a Mac, these lines can be added to file /etc/ssh_config

If you are still not successful after this, execute "ssh-add" and try again.

Compiling PHI accelerated programs

One must use the most recent versions of the Intel compilers and MKL libraries, and all programs must be compiled on gb49, the PHI host, where the Intel® Manycore Platform Software Stack (MPSS) is installed.

First login:

ssh -A goblin.sharcnet.ca
ssh -A gb49

Now one must source the Intel compiler by hand, since the module environment is not available on gb49:

source /opt/sharcnet/intel/14.0.0/icc/bin/compilervars.sh intel64

Additionally, one must specify the Intel compiler license file path:

export INTEL_LICENSE_FILE=/opt/sharcnet/intel/14.0.0/license/intel.lic

If one is using MKL, then it must also be loaded by hand:

source /opt/sharcnet/mkl/11.1/bin/compilervars.sh intel64

At this point one should be able to compile PHI code to run natively, or using the PHI as a coprocessor. For example, to compile an MKL program for native execution on the PHI one would execute:

icpc -mmic source.cpp -openmp -mkl

The resulting binary can then be run on the PHI.

Further Reading

Submitting Jobs

Contributors have preferential access to the resources. Contributors' jobs may suspend regular user jobs at any time, delaying them for up to 7 days (suspended jobs will be in state "Z"). Because of this, test jobs on the system are not guaranteed to start with 60 seconds. Users should run test jobs else where applicable if necessary.

In general users will experience the best performance on saw by ensuring that their jobs use whole nodes. Some measurements have shown that when MPI jobs are sharing nodes with other jobs that they slow down depending on resource contention.

This means that in general MPI jobs should use multiples of 8 cores. Threaded jobs can run up to 8 cores as saw has all 8 cores Xeon nodes.

When submitting MPI jobs, one should use the -N and -n flags to ensure a job is schedule to full nodes. For example, if your program is going to use 64 processes, one would submit it as:

sqsub -q mpi -n 64 -N 8 <...>

It is important to include -N 8 to ensure the job is not scattered on nodes where other user's jobs are running.