|Note: This page's content only applies to the SHARCNET's legacy systems; it does not apply to Graham.|
sqsub - submit a job
sqsub [-r|W runtimelimit] [-o ofile] [-e efile] [-i ifile] [-t or --test] [-q queue] [-f flag] [-n ncpus] [-N nnodes] [--ppn processes-per-node] [--tpp threads-per-process] [--mpp memory-per-process] [--nodes nodespec] [--pack] [--gpp gpus-per-process] [-m|--mail|--mail-start|--mail-end] [-w jobids] [--project projectname] [-j jobname] [--nompirun] [-v][-d] your_command -yourarg...
sqsub will submit jobs to the SHARCnet Queueing system.
(-r|-W) timelimit (THIS IS REQUIRED FOR EVERY JOB) provide a runtime limit (elapsed, wallclock time, not summed across cpus) specified in any of the following forms: 15 (assumed to be minutes) 15m (same) .25h (same) 2.5h (2 hours 30 minutes) 3.5d (3 days 12 hours) 84:0 (same, in LSF's hours:minutes format)
-i ifile job reads inputs from 'ifile' (no default) -o ofile job output to 'ofile' (REQUIRED FOR EVERY JOB) -e efile job errors go to 'efile' (default: same as -o)
-t|--test 'test' mode: short but immediate (preemptive) -q queue queue name (serial, threaded, mpi; default serial) --project projectname provide a project to which this job's usage will be accounted. (by default, your account's sponsor's project will be used.) -f flag specify certain flags to modify behavior. flags include: mpi, interactive, test, mail, permitcoredump -n ncpus require n cpus or cores (default 1) -N nnodes require n nodes (does not imply exclusive use) --ppn=ppn start ppn processes per node --tpp=tpp permit tpp threads per process (OMP_NUM_THREADS) --gpp=gpp allocate gpp gpus per process --mpp= --memperproc= amount of memory required by each process. may be specified like 64M or 2.5G (M=2^20, G=2^30). for an MPI job, this is the per-rank size. for threaded jobs, it's the process size, (that is, not per-thread.) --nodes=clu[1-4] require a specific set of nodes. eg wha[1-4] or req666. --pack require a minimal number of nodes, so processes occupy all cpus per node. -w|--waitfor=jobid[,jobid...]] wait for a list of jobs to complete --mail-start notify when the job starts. --mail-end notify when the job ends. -m|--mail (compatibility - same as mail-end) the destination email address is the one associated with your web-portal account. -j|--jobname provides a name for the job. --idfile=fname write the jobid into a file named 'fname'. --nompirun don't automatically invoke mpirun for mpi jobs -f flag specify certain flags to modify behavior. Universal flags include: mpi, threaded, test, mail on some clusters, other flags have added meaning, such xeon/opteron on Hound, and dual/quad on Goblin and selecting sub-clusters on Kraken (bal/bru/dol/meg/tig/wha/nar) -h or --help show brief usage message --man show man page -v|--verbose verbose mode: shows debugging-type details -d|--debug debug mode: don't actually submit, but show the command
your_command needs to be an actual executable, whether given explicitly (like "./a.out") or found on your PATH variable.
sqsub acts like a prefix to your command, so to test a very simple job, you might just do:
sqsub --test -r5m -o myout ./myprogram
if your program takes arguments, place them after your command:
sqsub --test -r5m -o myout ./myprogram -x 32
your program will see the "-x 32", but is unaware of sqsub prefix; likewise, sqsub pays no attention to arguments after the executable name ("./myprogram"). some more realistic examples:
sqsub -r 1d -o out.%J ./my_serial_program -x 32 sqsub -r 1d -o out.%J -q mpi -n 16 ./my_mpi_program -x 32 sqsub -r 1d -o out.%J -q threaded -n 4 ./my_openmp_program -x 32 sqsub -r 1d -o out.%J ./my_serial_program -x 32 -o foo
The '%J' within output or error filenames is expanded to the jobid (number), so in the examples above, the output filename would be like "out.123".
Please use -i, -o, -e switches to direct file IO to files. You MUST provide some indication of where job and scheduler output should go. Jobs submitted without -o will be rejected, but if you truly want to discard all output, you may specify "-o /dev/null" (or "-o none" for convenience).
Job output will be appended to the specified file; in most cases, a short epilogue will also be appended, providing some information about the job's resource usage. Please avoid using the same output file for multiple jobs, since concurrent writes to a file can cause significant locking overhead when the jobs are on different hosts.
There are three standard SHARCNET queues:
serial: one processor. threaded: one node, >1 cpus, as with OpenMP or pthreads. mpi: any number of processors/nodes using MPI
These are really types or categories of jobs. Any of these kinds of jobs can also be run in "test mode", which is purely for testing (debugging or checking to see that a job will start properly), and is limited to 1 hour. There may also be some cluster-specific queues (such as 'gaussian' for using the SHARCNET Gaussian license.)
To allocate multiple processors, you must specify at least one of the following: ncpus, nnodes, ppn. For instance, "-N 4 --ppn 2" implies ncpus=8, but "-n 8" alone may result in the job being spread across between 1 and 8 nodes. To the scheduler, a job with "-n8" is easier to find resources for than a job that is more specific, such as "-n8 --ppn 4" (the latter requires exactly 2 nodes with 4 free processors each).
Hybrid jobs are those which use both MPI and threading (usually OpenMP). They differ from MPI jobs only slightly: you should specify "--tpp" to determine how many threads are required by each MPI rank.
Jobs operate with reserved memory, given by the --mpp switch. A job cannot exceed this amount: when it attempts to allocate memory that would exceed, the allocation will fail. (Programs should check the return from memory allocations - this is good coding practice in any environment.) You should determine how much memory your program will use before submitting production runs - you can do this by inspection (calculating the total extent of memory used by the program, including code, shared libraries, stack space, etc). Or you can overestimated, but monitor the program's actual use (
sqjobs -L $jobid is useful for this: the column marked "virtual" shows the memory footprint of each process in kilobytes.)
If no --mpp parameter is supplied, a system-specific default will be used (and a warning message printed). The default is basically the minimum per-processor share of all the compute nodes in the cluster, so on a cluster with 24-processor, 32G nodes, the default is 1GB.
flags: mpi threaded nompirun mail usage permitcoredump dostack