From Documentation
Jump to: navigation, search
Line 11: Line 11:
 
{{:FAQ: About SHARCNET}}
 
{{:FAQ: About SHARCNET}}
 
{{:FAQ: Getting an Account with SHARCNET and Related Issues}}
 
{{:FAQ: Getting an Account with SHARCNET and Related Issues}}
{{:FAQ: Compiling and Running Programs}}
 
 
{{:FAQ: Programming and Debugging}}
 
{{:FAQ: Programming and Debugging}}
 
{{:FAQ: Getting Help}}
 
{{:FAQ: Getting Help}}

Revision as of 13:21, 10 May 2019

Sharcnet logo.jpg
Knowledge Base / Expanded FAQ

Note: In the Fall of 2018 this FAQ is receiving a major update to account for retirement of old systems. If you still want to see the old FAQ, a snapshot of it is available at this link. Some of the information on this page has now been moved to the Legacy Systems page.

This page is a comprehensive collection of essential information needed to use SHARCNET, gathered conveniently on a single page of our Help Wiki. If you are a new SHARCNET user, this page most likely contains all you need to get going on SHARCNET. However, there is much more information in this Help Wiki. Please use the search box to find pages that may be relevant to you. You can also go to the Main Page of this wiki for a general table of contents. Finally, you can also look at the list of all articles in this Help Wiki or a list of all categories.


Contents


About SHARCNET

What is SHARCNET?

SHARCNET stands for Shared Hierarchical Academic Research Computing Network. Established in 2000, SHARCNET is the largest high performance computing consortium in Canada, involving 18 universities and colleges across southern, central and northern Ontario.

SHARCNET is a member consortium in the Compute/Calcul Canada national HPC platform.

Where is SHARCNET?

The main office of SHARCNET is located in the Western Science Centre at The University of Western Ontario. The SHARCNET high performance clusters are installed at a number of the member institutions in the consortium and operated by SHARCNET staff across different sites.

What does SHARCNET have?

The primary SHARCNET compute system is the Graham heterogeneous cluster located at the University of Waterloo. It is named after Wes Graham, the first director of the Computing Centre at Waterloo. It consists of 36,160 cores and 320 GPU devices, spread across 1,127 nodes of different configurations.

What can I do with SHARCNET?

If you have a program that takes months to run on your PC, you could probably run it within a few hours using hundreds of processors on the SHARCNET clusters, provided your program is inherently parallelisable. If you have hundreds or thousands of test cases to run through on your PC or computers in your lab, then with hundreds of processors running those cases independently will significantly reduce your test cycles .

If you have used beowulf clusters made of commodity PCs, you may notice a performance improvement on SHARCNET clusters which have high-speed Infiniband interconnects, as well as SHARCNET machines which have large amounts of memory. Also, SHARCNET clusters themselves are connected through a dedicated, private connection over the Ontario Research Innovation Optical Network (ORION).

If you have access to other super computing facilities at other places and you wish to share your ideas with us and SHARCNET users, please contact us. Together we can make SHARCNET better.

Who is running SHARCNET?

The daily operation and development of SHARCNET computational facilities is managed by a group of highly qualified system administrators. In addition, we have a team of high performance technical computing consultants, who are responsible for technical support on libraries, programming and application analysis.

How do I contact SHARCNET?

For technical inquiries, you may send E-mail to help@sharcnet.ca, or contact your local system administrator or HPC specialist. For general inquiries, you may contact the SHARCNET main office.

Getting an Account with SHARCNET and Related Issues

To use SHARCNET (and also Compute Canada) facilities one has to apply for a Compute Canada account.

Programming and Debugging

What is MPI?

MPI stands for Message Passing Interface, a standard for writing portable parallel programs which is well-accepted in the scientific computing community. MPI is implemented as a library of subroutines which is layered on top of a network interface. The MPI standard has provided both C/C++ and Fortran interfaces so all of these languages can use MPI. There are several MPI implementations, including OpenMPI and MPICH. Specific high-performance interconnect vendors also provide their own libraries - usually a version of MPICH layered on an interconnect-specific hardware library.

For an MPI tutorial refer to the MPI tutorial.

In addition to C/C++ and Fortran versions of MPI, there exist other language bindings as well. If you have any special needs, please contact us.

What is OpenMP?

OpenMP is a standard for programming shared memory systems using threads with compiler directives instrumented in the source code. It provides a higher-level approach to utilizing multiple processors within a single machine while keeping the structure of the source code as close to the conventional form as possible. OpenMP is much easier to use than the alternative (Pthreads) and thus is suitable for adding modest amounts of parallelism to pre-exiting code. Because OpenMP is a set of programs, your code can still be compiled by a serial compiler and should still behave the same.

OpenMP for C/C++ and Fortran are supported by many compilers, including the PathScale and PGI for Opterons, and the Intel compilers for IA32 and IA64 (such as SGI's Altix.). OpenMP support has been provided in the GNU compiler suite since v4.2 (OpenMP 2.5), and starting with v4.4 supports the OpenMP 3.0 standard.

How do I run an OpenMP program with multiple threads?

An OpenMP program uses a single process with multiple threads rather than multiple processes. On multicore (i.e practically all) systems, threads will be scheduled on available processors, thus run concurrently. In order for each thread to run on one processor, one needs to request the same number of CPUs as the number of threads to use. To run an OpenMP program foo that uses four threads, use the following job submission script.

The option --cpus-per-task=4 specifies to reserve 4 CPUs per process.

#!/bin/bash
#SBATCH --account=def-someuser
#SBATCH --time=0-0:5
#SBATCH --cpus-per-task=4
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./ompHello

For a basic OpenMP tutorial refer to OpenMP tutorial.

How do I measure the cpu time when running multi-threaded job?

If you submit a job through the scheduler, then timing information will be collected by the scheduler itself and stored for later query.

If you are running an OpenMP program interactively, you can use the time utility to collect information.

In a typical example using 8 threads,

export OMP_NUM_THREADS=8
time ./ompHello

Your output will be something like:

real	0m1.633s
user	0m1.132s
sys	0m0.917s

In this example the real and user time are comparable, so the particular example program is not benefitting from multithreading. In general, real time should be less than user time if parallel execution is occurring.

What mathematics libraries are available?

Every system has the basic linear algebra libraries BLAS and LAPACK installed. Normally, these interfaces are contained in vendor-tuned libraries. On Intel-based (Xeon) clusters it's probably best to use the Intel math kernel library (MKL). On Opteron-based clusters, AMD's ACML library is available. However, either library will work reasonably well on both types of systems. If one expects to do a large amount of computation, it is generally advisable to benchmark both libraries so that one selects the one offering best performance for a given problem and system.

One may also find the GNU scientific library (GSL) useful to some point for their particular needs. The GNU scientific library is an optional package, available on any machine.

For a detailed list of libraries on each clusters, please check the documentation on the corresponding SHARCNET satellite web sites

How do I use mathematics libraries such as BLAS and LAPACK routines?

First you need to know which subroutine you want to use. You need to check the references to find what routines meet your needs. Then place calls to those routines you want in your program and compile your program to use the particular libraries that have those routines. For instance, if you want compute the eigenvalues, and optionally the eigenvectors, of an N by N real non symmetric matrix in double precision, you find the LAPACK routine DGEEV will do that. All you need to do is to have a call to DGEEV, with required parameters as specified in the LAPACK document, and compile your program to link against the LAPACK library.

Now to compile the program, you need to link it to a library that contains the LAPACK routines you call in your code. The general recommendation is to use Intel's MKL library, which has a module loaded by default on most Compute Canada/SHARCNET systems. The instructions on how to link your code with these libraries at compile time are provided on the MKL page.


My code is written in C/C++, can I still use those libraries?

Yes. Most of the libraries have C interfaces. If you are not sure about the C interface or you need assistance in using those libraries written in Fortran, we can help you out on a case to case basis.

What packages are available?

Various packages have been installed on Compute Canada/SHARCNET clusters at users' requests. The full, up to date list is available on the Compute Canada documentation wiki (link). If you do not see a package that you need on this list, please request it by submitting a problem ticket.

You can also search this wiki or the Compute Canada wiki for the package you are interested in to see if there is any additional information about it available.

We can also help you with compiling/installing a package into your own file space if you prefer that approach.

What interconnects are used on SHARCNET clusters?

Currently, several different interconnects are being used on SHARCNET clusters: Quadrics, Myrinet, InfiniBand and standard IP-based ethernet.

Debugging serial and parallel programs

Debugger is a program which helps to identify mistakes ("bugs") in programs - either run-time, or "post-mortem" (by analyzing the core file produced by a crashed program). Debuggers can be either command-line, or GUI (graphical user interface) based. Before a program can be debugged, it needs to be (re-)compiled with a switch, -g, which tells the compiler to include symbolic information into the executable. For MPI problems on the HP XC clusters, -ldmpi includes the HP MPI diagnostic library, which is very helpful for discovering incorrect use of the API.

SHARCNET highly recommends using our commercial debugger DDT. It has a very friendly GUI, and can also be used for debugging serial, threaded, MPI, and CUDA (GPGPU) programs. A short description of DDT and cluster availability information can be found on its software page. Please also refer to our detailed Parallel Debugging with DDT tutorial.

SHARCNET also provides gdb (installed on all clusters, type "man gdb" to get a list of options and see our Common Bugs and Debugging with gdb tutorial).


What is NaN ?

NaN stands for "Not a Number". It is an undefined or unrepresentable value, typically encountered in floating point arithmitic (eg. the square root of a negative number). To debug this in your program one typically has to unmask or trap floating point exceptions. This is fairly straightforward with Fortran compilers (e.g. with the Intel's ifort one simply needs to add one switch, "-fpe0"), but somewhat more complicated with C/C++ codes, where the best solution is to use feenableexcept() function. There are further details in the Common Bugs and Debugging with gdb tutorial.

My program exited with an error code XXX - what does it mean?

Your application crashed, producing an error code XXX (where XXX is a number). What does it mean? The answer may depend on your application. Normally, user codes are not touching the first 130 or so error codes, which are reserved for the Operational System level error codes. On most of our clusters, typing

 perror  XXX

will print a short description of the error. (This is a MySQL utility, and for XXX>122 it will start printing only MySQL-related error messages.) The accurate for the current OS (operational system) list of system error codes can be found on our clusters by printing the content of the file /usr/include/asm-x86_64/errno.h (/usr/include/asm-generic/errno.h on some systems).


Getting Help

I have encountered a problem while using a Compute Canada/SHARCNET system and need help, who should I talk to?

If you have access to the Internet, we encourage you to use the problem ticketing system (described in detail below) . This is the most efficient way of reporting a problem as it minimizes email traffic and will likely result in you receiving a faster response than through other channels.

You are also welcome to contact system administrators and/or high performance technical computing consultants at any time. You may find their contact information on the directory page.

How long should I expect to wait for support?

Unfortunately Compute Canada/SHARCNET does not have adequate funding to provide support 24 hours a day, 7 days a week. User support and system monitoring is limited to regular business hours: there is no official support on weekends or holidays, or outside 9:00 - 17:00 EST .

Please note that this includes monitoring of our systems and operations, so typically when there are problems overnight or on weekends/holidays system notices will not be posted until the next business day.

Compute Canada Problem Ticket System

What is a "problem ticket system"?

This is a system that allows anyone with a Compute Canada account to start a persistent email thread that is referred to as a "problem ticket". When a user submits a new ticket it will be brought to the attention of an appropriate and available Compute Canada/SHARCNET staff member for resolution.

You can interact with the ticket system entirely via email. There is also a web interface to see tickets you have submitted in the past.

What do I need to specify in a ticket ?

To help us address your question faster, please try to do the following when submitting a ticket:

  1. specify which of our systems is involved
  2. if the problem pertains to a job, then report the jobid associated with the job; this is an integer that is returned by the scheduler when you submit the job
  3. report the exact commands necessary to duplicate the problem, as well as any error output that helps identify the problem; if relevant, this should include how the code is compiled, how the job is submitted, and/or anything else you are doing from the command line relating to the problem
  4. if you'd like for a particular staff member to be aware of the ticket, mention them

How do I submit a ticket?

In general, you can submit a new ticket by emailing support@computecanada.ca with the email address associated with your Compute Canada account. If you are using another email address, please provide your full name, your Compute Canada default username (if available) and your university or institution.

If you like, you can also target your inquiry more specifically, by using the following addresses to submit your ticket:

I am new to parallel programming, where can I find quick references at SHARCNET?

SHARCNET has a number of training modules on parallel programming using MPI, OpenMP, pthreads and other frameworks. Each of these modules has working examples that are designed to be easy to understand while illustrating basic concepts. You may find these along with copies of slides from related presentations and links to external resources on the Main Page of this training/help site.

I am new to parallel programming, can you help me get started with my project?

Absolutely. We will be glad to help you from planning the project, architecting your application programs with appropriate algorithms and choosing efficient tools to solve associated numerical problems to debugging and analyzing your code. We will do our best to help you speed up research. If your programming project would involve a significant staff time, you should consider applying for Dedicated Programming support. (We run the competition annually; see https://www.sharcnet.ca/my/research/programming).

Can you install a package on a cluster for me?

Certainly. We suggest you make the request by sending e-mail to help@sharcnet.ca with the specific request.

I am in a process of purchasing computer equipment for my research, would you be able to provide technical advice on that?

If you tell us what you want, we may be able to help you out.

Does SHARCNET provide any training on programming and using the systems?

Yes. SHARCNET provides workshops on specific topics from time to time and offers courses at some sites. Every summer (usually late May to early June), SHARCNET holds an annual HPC Summer School with a variety of in-depth, hands-on workshops. Many materials from past workshops/presentations can be found on the SHARCNET's web portal.

SHARCNET also offers a series of online seminars (so-called "General interest webinars"), typically delivered every second Wednesday at lunch time. These are announced via the SHARCNET events mailing list and one can see the schedule at the SHARCNET event calendar. Past seminars are recorded and posted on our youtube channel. A full listing of the past webinars is available on the Online Seminars page.

Attending SHARCNET Webinars

SHARCNET makes a number of seminar events available online (New User Seminar, general interest talks, etc.) using software/services from Vidyo. Vidyo allows both the presenter and the attendees to offer or participate in online seminars by using their web browser or installing a small application. If this is your first Vidyo seminar please join the seminar ahead of the official start, to sort out any technical issues. Vidyo is supported on most platforms, both "stationary" (Windows, MacOS, Linux) and mobile (iOS, Android).

Please note that if your device has a microphone (highly recommended) and/or webcam, they will be used by Vidyo to transmit your audio and video to all seminar participants. They will be on by default, but you can always disable them by clicking on a corresponding button at the bottom of your Vidyo window. We ask that all attendees keep their microphones muted, unless you want to ask something.

We normally record our seminars, and make them available to all SHARCNET users. All recent and new webinars are posted on our youtube channel, http://youtube.sharcnet.ca . The links to the video recordings, slides and abstracts can be found on our online seminars page.

If you do not have headphones and or microphone, we provide a toll free number call-in option: 1-855-728-4677, ext 5542.

To receive email notifications about upcoming General Interest seminars, Summer Schools, and other training events, add your email to our Events mailing list.

Please note that times for our webinars are for the Eastern Time (EST/EDT) zone.

Research at SHARCNET

I have a research project I would like to collaborate on with SHARCNET, who should I talk to?

You may contact SHARCNET head office or contact members of the SHARCNET technical staff.

How can I contribute compute resources to SHARCNET so that other researchers can share it?

Most people's research is "bursty" - there are usually sparse periods of time when some computation is urgently needed, and other periods when there is less demand. One problem with this is that if you purchase the equipment you need to meet your "burst" needs, it'll probably sit, underutilized, during other times.

An alternative is to donate control of this equipment to SHARCNET, and let us arrange for other users to use it when you are not. We prefer to be involved in the selection and configuration of such equipment. Our promise to contributors is that as much as possible, they should obtain as much benefit from the cluster as if it were not shared. Owners get preferential access. Naturally, owners are also able to burst to higher peak usage, since their equipment has been pooled with other contributions. (Technically, SHARCNET cannot itself own such equipment — it remains owned by the institution in question, and will be returned to the contributor upon request.) If you think this model will also work for you and you would like to contribute your computational resource to help the research community at SHARCNET, you can contact us for such arrangement.

I do not know much about computation, nor is it my research interest. But I am interested in getting my research done faster with the help of the high performance computing technology. In other words, I do not care about the process and mechanism, but only the final results. Can SHARCNET provide this type of help?

We will be happy to bring the technology of high performance computing to you to accelerate your research, if at all possible. If you would like to discuss your plan with us, please feel free to contact our high performance computing specialists. They will be happy to listen to your needs and are ready to provide appropriate suggestions and assistance.

I need access to more CPU cores or storage than are available by default, what programs exist to support demanding computation?

SHARCNET participates in the Compute Canada NRAC (National Resource Allocation Competition) and provides a continual competition for groups that require more than the default level of access to our resources. Please see Dedicated Resources for further information.

I heard SHARCNET offers fellowships, where can I get more information?

SHARCNET no longer actively runs a fellowship program. You may find information regarding past fellowships and other dedicated resource opportunities on the Research Fellowships page of the web portal.

I would like to do some research at SHARCNET as a visiting scholar, how should I apply?

In general, you will need to find a hosting department or a person affiliated with one of the SHARCNET institutions. You may also contact us directly for more specific information.

I would like to send my students to SHARCNET to do some work for me. How should I proceed?

See above.



Contacting SHARCNET

How do I contact SHARCNET for research, academic exchanges, and technical issues?

Please contact SHARCNET head office.

How do I contact SHARCNET for business development, education and other issues?

Please contact SHARCNET head office.

How do I contact a specific staff member at SHARCNET?

See staff directory for contact information.

How to Acknowledge SHARCNET in Publications

How do I acknowledge SHARCNET in my publications?

We recommend one cite the following:

This work was made possible by the facilities of the Shared Hierarchical 
Academic Research Computing Network (SHARCNET:www.sharcnet.ca) and Compute/Calcul Canada.

I've seen different spellings of the name, what is the standard spelling of SHARCNET?

We suggest the spelling SHARCNET, all in upper case.


What types of research programs / support are provided to the research community?

Our overall intent is to provide support that can both respond to the range of needs that the user community presents and help to increase the sophistication of the community and enable new and larger-in-scope applications making use of SHARCNET's HPC facilities. The range of support can perhaps best be understood in terms of a pyramid:

Level 1

At the apex of the pyramid, SHARCNET supports a small number of projects with dedicated programmer support. The intent is to enable projects that will have a lasting impact and may lead to a "step change" in the way research is done at SHARCNET. Inter-disciplinary and inter-institutional projects are particularly welcomed. For the latest information about the program, including application guidelines, please see the Programming Competition page in our web portal.

Level 2

The middle layers of support are provided through a number of initiatives.

These include:

  • Programming support of more modest duration (several days to one month engagement, usually part time)
  • Training on a variety of topics through workshops, seminars and online training materials
  • Consultation. This may include user-initiated interactions on particular programs, algorithms, techniques, debugging, optimization etc., as well as unsolicited help to ensure effective use of SHARCNET systems
  • Site Leaders play an important role in working with the community to help researchers connect with SHARCNET staff and to obtain appropriate help and support.

Level 3

The base level of the pyramid handles the very large number of small requests that are essential to keeping the user community working effectively with the infrastructure on a day-to-day basis. Several of these can be answered by this FAQ; many of the issues are presented through the ticketing system. The support is largely problem oriented with each problem being time limited.