From Documentation
Jump to: navigation, search
monk
Hostname: monk.sharcnet.ca
Target usage: GPU jobs
System information: see monk system page in web portal
System status: see monk status page
Real time system data: see Ganglia monitoring page
Full list of SHARCNET systems


Monk is a SHARCNET's GPGPU cluster.

System Overview

  • Number of Cores: 432
  • Number of Nodes: 54
  • Interconnect: QDR InfiniBand
  • Cores per node: 8
  • GPUs per node: 2 Tesla M2070
    • Fermi, 448 cores, 1.15 Ghz, 6 GB memory, 144 GB/s memory bandwidth, compute 2.0
  • Memory per node: 48 GB
  • CPU: Intel E5607 4 cores @ 2.26 GHz

Optimizing for Fermi architecture

Monk uses Fermi architecture GPUs with compute capability 2.0. To ensure your code is optimized correctly, consult the following documentation:

The key thing to remember it to always compile with the -arch=sm_20 flag if optimized code is needed.

Job Submission

For instructions on submitting GPU jobs please see SHARCNET's GPU Accelerated Computing documentation

Please note that interactive GPU jobs do not work. The cluster is optimized for batch usage, please use the development node below for interactive work.

Development Node

Monk has a single GPU-equipped node dedicated to development work. This node does not run user jobs and is similar to a login node, with the exception being that it is only accessible within monk. You may log into this node via ssh to do work interactively. The GPUs are set in shared compute mode, which means that multiple users can access the GPUs at the same time.

Once you have logged into monk, you can access the development node by doing ssh mon-devel1 or ssh monk-devel1 (this is actually node 54 at present) . Once inside the node, you can just run your executable directly from the command line; there is no need to use sqsub. If your executable is called test.x, you would change into your working directory and execute:

./test.x

If the job is somewhat longer and you don't want to keep a terminal open while it runs, it is possible to run it in the background in such a way that it will not terminate even when you end your terminal session. To do this, you would do:

nohup ./test.x > test_output.out &

If you then type

exit

to log out, your job will continue running to completion, so you can log in later when it's done and examine the output in the test_output.out file.

Keep in mind that the current cpu time limit for processes running on the development node is 12 hours, though most of the processes running on this node should be much shorter than that as it is meant for development and not production runs.

Users must be aware of what other people are doing on the node and avoid over-requesting resources, primarily memory, as it will impact all users on the system. Use the free and uptime commands to see how busy the development node is.

Compiling

To make your compile match the architecture of monk, please remember to always use nvcc with flag -arch=sm_20 which ensure the executable is targeted at GPU cards with Comput Capability 2.0 installed in monk. Without this flag, a more generic, less efficient executable is generated.