From Documentation
Jump to: navigation, search

This page provides useful information for monitoring and acquiring information about jobs that are scheduled, running or have finished running at SHARCNET. This could include finding the nodes the job is running on, when it started running, whether or not it has been suspended during the course of it's execution, etc.

Job Scheduling

SHARCNET provides a unified interface to our various schedulers for users. This system is called SQ and you can read more about it in in the Knowledge Base. The rest of this document assumes you are familiar with SQ.

Behind SQ SHARCNET primarily deploys LSF (/SLURM) and Torque/Moab job schedulers and resource managers. These platforms do not behave identically so a user who wants to work on the command line at the machine must be familiar with the appropriate platform if they wish to understand how their jobs are handled.

Job Output File Behavior

On Torque/Moab systems job output is spooled and only copied to the final destination file at job completion. On LSF users can view their job output as the job is executing, modulo any buffering at the runtime/OS level.

Job Identifier

All jobs that are submitted to run on SHARCNET systems are assigned a unique job identifier value, sometimes referred to as a Job ID or just jobid. This value is used by the system's job scheduler to keep track of the job, and it is also used by the back-end SHARCNET job database to identify the job. Knowing the Job ID allows one to look up information both on the system's job scheduler and in the job database.

Finding the Job Identifier

At job submission

When one submits a job with sqsub the JobID is returned on the command line, eg.

[merz@wha781 ~]$ sqsub -r 10m -q serial -t -o date.out.wh date
submitted as jobid 3224233

In this case the jobid is 3224233.

While the job is queued / running or has recently finished

This value will also be listed in the jobid column when one runs sqjobs, eg.

[merz@wha781 ~]$ sqjobs
  jobid queue state ncpus    prio nodes time command
------- ----- ----- ----- ------- ----- ---- -------
3224233  test     Q     1 333.312         5s date   

After the job has finished

On requin (which runs LSF), one can find the jobid in the job output file (every job submitted with sqsub should use the -o flag to specify a job output file!). Looking at the file specified in the above sqsub command, date.out.wh for example:

[merz@req769 ~]$ cat job_output.requin.out | grep "Subject: Job"
Subject: Job 3224233: <date> Done

On most other SHARCNET systems (which run Torque+Maui/Moab), one can find the jobid in the job output file as well:

[merz@orc-login1 ~]$ cat job_output.orca.out | grep "job id"
              job id: 3224233

Web Portal Jobs Database

As a job progresses from waiting in the queue, through to running and on to completion, information about the job is logged to the SHARCNET job database.

You can look at the information associated with any of your jobs by visiting your web portal activity page. This page is helpful if you can't remember the details about a job and have lost or deleted the output file. The jobs are currently (Oct 2013) listed in a tabular format at the bottom of the activity page as follows:

jobs listing found in the web portal activity page
An example of the jobs listing which can be found in a user's activity page in the web portal.

Note: you may have to change the configuration to see the details you are interested in

By clicking on the values in the Job ID column in the jobs table one can access a job summary page that presents much of the same information provided by the bhist and qstat commands below.

Getting the system's view of a job

If a job is queued, running or recently completed you should be able to query the job scheduler on the system directly (either LSF or Torque/Moab) to get accurate and timely information about the state of your job.

If your job completed more than a couple of days ago then it was likely flushed out of the system's records and can only be found via the jobs database in the web portal.

If you don't know the job id you can look through your job listing in the webportal to find it.

LSF

To find out information about a job on a particular system running the LSF job scheduler one can use the bhist command. For example:

[merz@wha781 ~]$ bhist -l 3224233 

Job <3224233>, User <merz>, Project <600>, Command <date>
Thu May 27 10:19:01: Submitted from host <wha781>, to Queue <test>, CWD <$HOME>
                     , Output File <date.out.wh>; 

 RUNLIMIT                
 10.0 min of wha781
Thu May 27 10:19:06: Dispatched to <wha75>;
Thu May 27 10:19:06: Starting (Pid 14217);
Thu May 27 10:19:07: Running with execution home </home/merz>, Execution CWD </
                     home/merz>, Execution Pid <14217>;
Thu May 27 10:19:07: Done successfully. The CPU time used is 0.0 seconds;
Thu May 27 10:19:07: Post job process done successfully; 

Summary of time in seconds spent in various states by  Thu May 27 10:19:07
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  5        0        1        0        0        0        6           

You can see a number of things, including when it was submitted to the system, started, finished, exit states, compute nodes used, resource usage, etc. A job may be suspended periodically and that will also show up in this history.

On some systems (especially whale) the system's job log grows quickly and is turned over on a frequent basis. To find older jobs you may have to add the -n X flag to bhist, where X is an integer value, 10 or 30 usually suffices (the larger the number the longer it will take to complete as it searchs through more log files).

Torque/Moab

To find further details about a job running on a system that uses the Torque/Moab job scheduler one should use the qstat command, eg.

[merz@hnd50 ~]$ qstat -f 194242
Job Id: 194242.hnd51
    Job_Name = date
    Job_Owner = merz@hnd50
    resources_used.cput = 00:00:00
    resources_used.mem = 0kb
    resources_used.vmem = 0kb
    resources_used.walltime = 00:00:00
    job_state = C
    queue = test
    server = hnd51
    Checkpoint = u
    ctime = Thu May 27 11:19:52 2010
    Error_Path = hnd50:/home/merz/date.e194242
    exec_host = hnd21/0
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = n
    mtime = Thu May 27 11:19:53 2010
    Output_Path = hnd50:/home/merz/date.out.ho
    Priority = 0
    qtime = Thu May 27 11:19:52 2010
    Rerunable = False
    Resource_List.cput = 00:10:00
    Resource_List.nodect = 1
    Resource_List.nodes = 1
    Resource_List.pvmem = 3072mb
    Resource_List.walltime = 00:10:00
    session_id = 12358
    Variable_List = PBS_O_HOME=/home/merz,PBS_O_LANG=C,PBS_O_LOGNAME=merz,
        PBS_O_PATH=~/bin/blast-2.2.21/bin/:/opt/sharcnet/vmd/current/bin:/opt
    <SNIP>  This is a long list .... </SNIP>
          OMP_NUM_THREADS=1,PBS_O_QUEUE=test
    etime = Thu May 27 11:19:52 2010
    exit_status = 1
    submit_args = -V -r n -j oe -o /home/merz/date.out.ho -j oe -q test -N dat
        e -d /home/merz -l walltime=0:10:00 -l cput=0:10:00 -l pvmem=3072m -m 
        n -l nodes=1 -
    start_time = Thu May 27 11:19:53 2010
    start_count = 1
    comp_time = Thu May 27 11:19:53 2010

This will show you all of the scheduler properties about the job (nodes used, resource limits, timestamps, etc.) as well as information concerning the execution environment of the job.

As with LSF, this data is turned over pretty frequently so your job may not be listed on the cluster, in which case you will only be able to find out information concerning the job in the jobs database via the web portal.

Inspecting running jobs

In order of increasing complexity; different ways to look at your job's underlying processes and the compute nodes it is using include:

  1. sqjobs -L command
  2. logging into the nodes directly and running diagnostic commands like ps and top
  3. looking at the ganglia plots for the nodes the job was running on at the given time

sqjobs -L

By running sqjobs with the -L option one can get a process listing of all the processes involved with their job in the typical ps output format. For example:

[merz@wha780 merz]$ sqsub -t -q serial -r 10m -o mafft_test.1 mafft --auto /home/merz/input_sequences               
submitted as jobid 3229987
<wait for job to start...>
[merz@wha780 merz]$ sqjobs -L
  jobid hostid   pid state resident virtual %cpu command                        
------- ------ ----- ----- -------- ------- ---- -------------------------------
3229987      9 19728     R   478916  592436 99.4 ~merz/lib/mafft/disttbfast -b 6
tot_rss 467.7M tot_vsz 578.6M avg_pcpu 97.6% cur_pcpu 99.4% 

  jobid queue state ncpus    prio nodes time command                            
------- ----- ----- ----- ------- ----- ---- -----------------------------------
3229987  test     R     1 166.495  wha9  39s mafft --auto /home/merz/input_seque
2712 CPUs total, 16 idle, 2696 busy; 2149 jobs running; 0 suspended, 1533 queued.
[merz@wha780 merz]$ 

Only 1 serial job is running; it is using ~500MB of memory and 99.4% of a CPU on wha9.

For MPI jobs the process listing will include all processes participating in the job.

logging into compute nodes

One can also log directly into the compute nodes to run ps and top to see the job execute in realtime. We do not recommend this and you should refrain from running anything other than simple diagnostic commands on the compute nodes. One can find the list of nodes participating in the job via the methods described above.

Ganglia

SHARCNET is currently in the process of bringing the Ganglia monitoring system online. If you know the time period during which your job was running and the nodes it was running on you can inspect a wide variety of performance metrics that were collected while your job was executing. The main entry point for Ganglia is http://ganglia.sharcnet.ca. Once there, you can click through to look at timeline plots for particular nodes by clicking on a particular cluster and then clicking on the node. For example, plots for kraken/narwhal node nar111 for the last hour can be found here.

One thing to keep in mind about Ganglia data is that it only captures information at the node level. If your job was sharing the node with other jobs (which is common for serial jobs) then you will be looking at the aggregate performance of all jobs on the node at that time (to avoid this request a full node for your job).