From Documentation
Jump to: navigation, search
Note: This page's content only applies to the SHARCNET's legacy systems; it does not apply to Graham.


sqsub - submit a job


sqsub [-r|W runtimelimit] 
[-o ofile] [-e efile] [-i ifile] 
[-t or --test] [-q queue] [-f flag] 
[-n ncpus] [-N nnodes] 
[--ppn processes-per-node] [--tpp threads-per-process] [--mpp memory-per-process] 
[--nodes nodespec] [--pack] 
[--gpp gpus-per-process] 
[-w jobids]
[--project projectname]
[-j jobname] 
  your_command -yourarg...


sqsub will submit jobs to the SHARCnet Queueing system.


               provide a runtime limit (elapsed, wallclock time, not summed 
               across cpus) specified in any of the following forms:
       15      (assumed to be minutes)
       15m     (same)
       .25h    (same)
       2.5h    (2 hours 30 minutes)
       3.5d    (3 days 12 hours)
       84:0    (same, in LSF's hours:minutes format)
-i ifile       job reads inputs from 'ifile' (no default)
-o ofile       job output to 'ofile' (REQUIRED FOR EVERY JOB)
-e efile       job errors go to 'efile' (default: same as -o)
-t|--test      'test' mode: short but immediate (preemptive)
-q queue       queue name (serial, threaded, mpi; default serial)
--project projectname
               provide a project to which this job's usage will be accounted.
               (by default, your account's sponsor's project will be used.)
-f flag        specify certain flags to modify behavior.  flags include:
               mpi, interactive, test, mail, permitcoredump
-n ncpus       require n cpus or cores (default 1)
-N nnodes      require n nodes (does not imply exclusive use)
--ppn=ppn      start ppn processes per node
--tpp=tpp      permit tpp threads per process (OMP_NUM_THREADS)
--gpp=gpp      allocate gpp gpus per process
--memperproc=  amount of memory required by each process.  may be specified
               like 64M or 2.5G (M=2^20, G=2^30).  for an MPI job, this is 
               the per-rank size.  for threaded jobs, it's the process size,
               (that is, not per-thread.)
               require a specific set of nodes.  eg wha[1-4] or req666.
--pack         require a minimal number of nodes, so processes occupy
               all cpus per node.   
               wait for a list of jobs to complete
--mail-start   notify when the job starts.
--mail-end     notify when the job ends.
-m|--mail      (compatibility - same as mail-end)
               the destination email address is the one associated with 
               your web-portal account.
-j|--jobname   provides a name for the job.
--idfile=fname write the jobid into a file named 'fname'.
--nompirun     don't automatically invoke mpirun for mpi jobs
-f flag        specify certain flags to modify behavior.  
               Universal flags include: mpi, threaded, test, mail
               on some clusters, other flags have added meaning, such 
               xeon/opteron on Hound, and dual/quad on Goblin and 
               selecting sub-clusters on Kraken (bal/bru/dol/meg/tig/wha/nar)
-h or --help   show brief usage message
--man          show man page
-v|--verbose   verbose mode: shows debugging-type details
-d|--debug     debug mode: don't actually submit, but show the command


your_command needs to be an actual executable, whether given explicitly (like "./a.out") or found on your PATH variable.

sqsub acts like a prefix to your command, so to test a very simple job, you might just do:

       sqsub --test -r5m -o myout ./myprogram

if your program takes arguments, place them after your command:

       sqsub --test -r5m -o myout ./myprogram -x 32

your program will see the "-x 32", but is unaware of sqsub prefix; likewise, sqsub pays no attention to arguments after the executable name ("./myprogram"). some more realistic examples:

       sqsub -r 1d -o out.%J ./my_serial_program -x 32
       sqsub -r 1d -o out.%J -q mpi -n 16  ./my_mpi_program -x 32
       sqsub -r 1d -o out.%J -q threaded -n 4 ./my_openmp_program -x 32
       sqsub -r 1d -o out.%J ./my_serial_program -x 32 -o foo

The '%J' within output or error filenames is expanded to the jobid (number), so in the examples above, the output filename would be like "out.123".

Please use -i, -o, -e switches to direct file IO to files. You MUST provide some indication of where job and scheduler output should go. Jobs submitted without -o will be rejected, but if you truly want to discard all output, you may specify "-o /dev/null" (or "-o none" for convenience).

Job output will be appended to the specified file; in most cases, a short epilogue will also be appended, providing some information about the job's resource usage. Please avoid using the same output file for multiple jobs, since concurrent writes to a file can cause significant locking overhead when the jobs are on different hosts.

There are three standard SHARCNET queues:

       serial: one processor.
       threaded: one node, >1 cpus, as with OpenMP or pthreads.
       mpi: any number of processors/nodes using MPI

These are really types or categories of jobs. Any of these kinds of jobs can also be run in "test mode", which is purely for testing (debugging or checking to see that a job will start properly), and is limited to 1 hour. There may also be some cluster-specific queues (such as 'gaussian' for using the SHARCNET Gaussian license.)

To allocate multiple processors, you must specify at least one of the following: ncpus, nnodes, ppn. For instance, "-N 4 --ppn 2" implies ncpus=8, but "-n 8" alone may result in the job being spread across between 1 and 8 nodes. To the scheduler, a job with "-n8" is easier to find resources for than a job that is more specific, such as "-n8 --ppn 4" (the latter requires exactly 2 nodes with 4 free processors each).

Hybrid jobs are those which use both MPI and threading (usually OpenMP). They differ from MPI jobs only slightly: you should specify "--tpp" to determine how many threads are required by each MPI rank.

Jobs operate with reserved memory, given by the --mpp switch. A job cannot exceed this amount: when it attempts to allocate memory that would exceed, the allocation will fail. (Programs should check the return from memory allocations - this is good coding practice in any environment.) You should determine how much memory your program will use before submitting production runs - you can do this by inspection (calculating the total extent of memory used by the program, including code, shared libraries, stack space, etc). Or you can overestimated, but monitor the program's actual use (sqjobs -L $jobid is useful for this: the column marked "virtual" shows the memory footprint of each process in kilobytes.)

If no --mpp parameter is supplied, a system-specific default will be used (and a warning message printed). The default is basically the minimum per-processor share of all the compute nodes in the cluster, so on a cluster with 24-processor, 32G nodes, the default is 1GB.

flags: mpi threaded nompirun mail usage permitcoredump dostack


sqjobs, sqkill, sqsuspend, sqresume