Sometimes I have to put text on a path

Wednesday, April 1, 2015

A PBS script is a standard Unix/Linux shell script that contains a few extra comments at the beginning that specify directives to PBS. These comments all begin with #PBS.



PBS is a job resource manager. A job is defined as a computational task such as computational simulation or data analysis. PBS provides job queuing and execution services in a batch cluster environment.
In the HPC systems, PBS works with the Moab job scheduler. PBS provides job information to Moab and Moab tells PBS which jobs to run and on what compute nodes in a cluster to run the jobs on.

Job Limits

PBS is configured on each system to have a number of separate job queues. There is a default queue on each system that every user has access to. Each funding group has its own queue.
These limits are placed on funding group queues:
Funding Group Queue Limits
Maximum Walltime (Hours)96 (default) to 336+
Maximum Processors in Use Per UserNo Limit
Maximum Job Size (Processors)32
Note that the walltime limits placed on funding group queues are arbitrary and can be adjusted at the request of the group's PI.
If you believe you're part of a funding group of a system and you don't know what queue you should be using, please email us at stating what group you are part of and that you need to know your group queue.
These limits are placed on system default queues:
System Default Queue Limits
Maximum Walltime (Hours)242496No Limit
Maximum Processors in Use Per User32No LimitNo LimitNo Limit
Maximum Job Size (Processors)32512 (nodes=64:ppn=8)256 (nodes=32:ppn=8)128 (nodes=1:ppn=128)
Most queue limits can be checked by running the command qstat -q.

Submitting a Job

Jobs are submitted to a PBS queue so that PBS can dispatch them to be run on one or more of a cluster's compute nodes. There are two main types of PBS jobs:
  • Non-interactive Batch Jobs: this is the most common PBS job. A job script is created that contains PBS resource requests and the commands necessary to execute the job. The job script is then submitted to PBS to be run non-interactively.
  • Interactive Batch Jobs: this is a way to get an interactive terminal on one or more of the compute nodes of a cluster. Commands can then be run interactily through that terminal directly on the compute nodes for the duration of the job. Interactive jobs are helpful for such things as program debugging and running many short jobs.

Non-interactive Batch Jobs

There are two steps to running a non-interactive batch job:
  1. Create a PBS Script
    A PBS script is a standard Unix/Linux shell script that contains a few extra comments at the beginning that specify directives to PBS. These comments all begin with #PBS. The most important PBS directives are:
    Definition of Important PBS Directives
    PBS DirectiveDescription
    #PBS -l walltime=HH:MM:SSThis directive specifies the maximum walltime (real time, not CPU time) that a job should take. If this limit is exceeded, PBS will stop the job. Keeping this limit close to the actual expected time of a job can allow a job to start more quickly than if the maximum walltime is always requested.
    #PBS -l pmem=SIZEgbThis directive specifies the maximum amount of physical memory used by any process in the job. For example, if the job would run four processes and each would use up to 2 GB (gigabytes) of memory, then the directive would read #PBS -l pmem=2gb. The default for this directive on Lion-XF and Lion-LSP is 1 GB (gigabyte) of memory. Other Lion clusters do not currently set a default.
    #PBS -l nodes=N:ppn=MThis specifies the number of nodes (nodes=N) and the number of processors per node (ppn=M) that the job should use. PBS treats a processor core as a processor, so a system with eight cores per compute node can have ppn=8 as its maximum ppn request. Note that unless a job has some inherent parallelism of its own through something like MPI or OpenMP, requesting more than a single processor on a single node is usually wasteful and can impact the job start time.
    #PBS -q queuenameThis specifies what PBS queue a job should be submitted to. This is only necessary if a user has access to a special queue. This option can and should be omitted for jobs being submitted to a system's default queue.
    #PBS -j oeNormally when a command runs it prints its output to the screen. This output is often normal output and error output. This directive tells PBS to put both normal output and error output into the same output file.
    The following is an example PBS script.
    # This is a sample PBS script. It will request 1 processor on 1 node
    # for 4 hours.
    #   Request 1 processors on 1 node 
    #PBS -l nodes=1:ppn=1
    #   Request 4 hours of walltime
    #PBS -l walltime=4:00:00
    #   Request 1 gigabyte of memory per process
    #PBS -l pmem=1gb
    #   Request that regular output and terminal output go to the same file
    #PBS -j oe
    #   The following is the body of the script. By default,
    #   PBS scripts execute in your home directory, not the
    #   directory from which they were submitted. The following
    #   line places you in the directory from which the job
    #   was submitted.
    #   Now we want to run the program "hello".  "hello" is in
    #   the directory that this script is being submitted from,
    #   $PBS_O_WORKDIR.
    echo " "
    echo " "
    echo "Job started on `hostname` at `date`"
    echo " "
    echo "Job Ended at `date`"
    echo " "
    Note that the above example script is for a non-MPI job. Information on how to write PBS scripts for MPI jobs can be found in the MPI software pages.
  2. Submit the PBS Script to PBS for Execution
    Once a PBS script is created, it needs to be submitted to PBS so that it becomes eligible to be run. The command to submit a script to PBS is called qsub. The syntax of qsub is:
    qsub scriptfile
    The following is an example of using qsub to submit a PBS script called myjob.
    % qsub myjob
    The job script myjob has just been submitted to PBS and has been assigned the Job_ID This Job_ID can later be used to control the job.

Interactive Batch Jobs

Interactive PBS jobs are similar to non-interactive PBS jobs in that they are submitted to PBS via the command qsub. Submitting an interactive PBS job differs from a non-interactive PBS job in that a PBS script is not necessary. All PBS directives can be specified on the command line.
The syntax for qsub for submitting an interactive PBS job is:
qsub -I ... pbs directives ...
The -I flag above tells qsub that this is an interactive job. The following example shows using qsub to submit an interactive job using one processor on one node for four hours.
lionxi:~$ qsub -I -l nodes=1:ppn=1 -l walltime=4:00:00
qsub: waiting for job to start
qsub: job ready

There are two things of note here. The first is that the qsub command doesn't exit when run with the interactive -I flag. Instead, it waits until the job is started and gives a prompt on the first compute node assigned to a job. The second thing of note is the prompt lionxi25:~$ - this shows that commands are now being executed on the compute node lionxi25.

Checking Job Status

The command to check job status is qstatqstat has many options. Some common ones are:
PBS Commands for Checking Job Status
Command NameDescription of Command Functionality
qstatShows the status of all PBS jobs. The time displayed is the CPU time used by the job.
qstat -sShows the status of all PBS jobs. The time displayed is the walltime used by the job.
qstat -u useridShows the status all PBS jobs submitted by the user userid. The time displayed is the walltime used by the job.
qstat -nShows the status all PBS jobs along with a list of compute nodes that the job is running on.
qstat -f jobidShows detailed information about the job jobid.
A job can be in several different states. The most common ones are:
PBS Job States
QThe job is queued and is waiting to start.
RThe job is currently running.
EThe job is currently ending.
HThe job has a user or system hold on it and will not be eligible to run until the hold is removed.
  • Example: qstat output
    lionxj:~$ qstat
    Job id        Name           User     Time Use S Queue
    ------------- -------------- -------- -------- - -----
    10.lionxj     sparse         abc123   188:20:2 R lionxj
    11.lionxj     test           jwh128   00:00:18 R lionxj-admin

    • Job id: the job's unique indentifier
    • Name: name of the job
    • User: user that owns the job
    • Time UseCPU time used by the job
    • S: state of the job
    • Queue: the queue the job is in
  • Example: qstat -s output
    lionxj:~$ qstat -s 
                                                            Req'd  Req'd   Elap
    Job ID        Username Queue    Jobname  SessID NDS TSK Memory Time  S Time
    ------------- -------- -------- -------- ------ --- --- ------ ----- - -----
    10.lionxj.rcc abc123   lionxj   sparse   5793     4  --    2gb 190:0 R 189:2
    11.lionxj.rcc jwh128   lionxj-a test     11946    3  --    --  500:0 R 166:5

    • Job id: the job's unique indentifier
    • Username: user that owns the job
    • Queue: the queue the job is in
    • Jobname: the name of the job
    • NDS: the number of compute nodes the job is using
    • Req'd Memory: the memory requested for the job
    • Req'd Time: the walltime requested for the job
    • S: the state of the job
    • Elap Time: the elapsed walltime for the job

Deleting a Job

The command to delete a job is qdel. Its syntax is "qdel Job_ID".
PBS Commands for Deleting Jobs
Command NameDescription of Command Functionality
qdel Job_IDDeletes the job identified by Job_ID.
qdel $(qselect -u username)Deletes all jobs belonging to user username.
  • Example: deleting a job with Job_ID 10
    lionxj:~$ qdel 10
  • Example: deleting all jobs belonging to user abc123
    lionxj:~$ qdel $(qselect -u abc123)

Viewing Job Output

By default PBS will write screen output from a job to the follwing files:

PBS Output Files
Output File NameContents of Output File
Jobname.oJob_IDThis file would contain the non-error output that would normally be written to the screen.
Jobname.eJob_IDThis file would contain the error output that would normally be written to the screen.
If the PBS directive #PBS -j oe is used in a PBS script, the non-error and the error output are both written to the Jobname.oJob_ID file.

More Information

More information on PBS and PBS scripts can be found in the man pages for the commands qsubpbs_resourcesqstat, and qdel.


No comments:

Post a Comment