PBS USER GUIDE
- Overview
- Job Limits
- Submitting a Job
- Checking Job Status
- Deleting a Job
- Viewing Job Output
- More Information
Overview
PBS is a job resource manager. A job is defined as a computational task such as computational simulation or data analysis. PBS provides job queuing and execution services in a batch cluster environment.In the HPC systems, PBS works with the Moab job scheduler. PBS provides job information to Moab and Moab tells PBS which jobs to run and on what compute nodes in a cluster to run the jobs on.
Job Limits
PBS is configured on each system to have a number of separate job queues. There is a default queue on each system that every user has access to. Each funding group has its own queue.These limits are placed on funding group queues:
Limit | Lion-X* |
---|---|
Maximum Walltime (Hours) | 96 (default) to 336+ |
Maximum Processors in Use Per User | No Limit |
Maximum Job Size (Processors) | 32 |
If you believe you're part of a funding group of a system and you don't know what queue you should be using, please email us at support@ics.psu.edu stating what group you are part of and that you need to know your group queue.
These limits are placed on system default queues:
Limit | Lion-X* | Lion-XK | CyberStar | Clsf |
---|---|---|---|---|
Maximum Walltime (Hours) | 24 | 24 | 96 | No Limit |
Maximum Processors in Use Per User | 32 | No Limit | No Limit | No Limit |
Maximum Job Size (Processors) | 32 | 512 (nodes=64:ppn=8) | 256 (nodes=32:ppn=8) | 128 (nodes=1:ppn=128) |
Submitting a Job
Jobs are submitted to a PBS queue so that PBS can dispatch them to be run on one or more of a cluster's compute nodes. There are two main types of PBS jobs:- Non-interactive Batch Jobs: this is the most common PBS job. A job script is created that contains PBS resource requests and the commands necessary to execute the job. The job script is then submitted to PBS to be run non-interactively.
- Interactive Batch Jobs: this is a way to get an interactive terminal on one or more of the compute nodes of a cluster. Commands can then be run interactily through that terminal directly on the compute nodes for the duration of the job. Interactive jobs are helpful for such things as program debugging and running many short jobs.
Non-interactive Batch Jobs
There are two steps to running a non-interactive batch job:- Create a PBS Script
A PBS script is a standard Unix/Linux shell script that contains a few extra comments at the beginning that specify directives to PBS. These comments all begin with #PBS. The most important PBS directives are:
The following is an example PBS script.Definition of Important PBS Directives PBS Directive Description #PBS -l walltime=HH:MM:SS This directive specifies the maximum walltime (real time, not CPU time) that a job should take. If this limit is exceeded, PBS will stop the job. Keeping this limit close to the actual expected time of a job can allow a job to start more quickly than if the maximum walltime is always requested. #PBS -l pmem=SIZEgb This directive specifies the maximum amount of physical memory used by any process in the job. For example, if the job would run four processes and each would use up to 2 GB (gigabytes) of memory, then the directive would read #PBS -l pmem=2gb . The default for this directive on Lion-XF and Lion-LSP is 1 GB (gigabyte) of memory. Other Lion clusters do not currently set a default.#PBS -l nodes=N:ppn=M This specifies the number of nodes (nodes=N) and the number of processors per node (ppn=M) that the job should use. PBS treats a processor core as a processor, so a system with eight cores per compute node can have ppn=8 as its maximum ppn request. Note that unless a job has some inherent parallelism of its own through something like MPI or OpenMP, requesting more than a single processor on a single node is usually wasteful and can impact the job start time. #PBS -q queuename This specifies what PBS queue a job should be submitted to. This is only necessary if a user has access to a special queue. This option can and should be omitted for jobs being submitted to a system's default queue. #PBS -j oe Normally when a command runs it prints its output to the screen. This output is often normal output and error output. This directive tells PBS to put both normal output and error output into the same output file.
# This is a sample PBS script. It will request 1 processor on 1 node # for 4 hours. # # Request 1 processors on 1 node # #PBS -l nodes=1:ppn=1 # # Request 4 hours of walltime # #PBS -l walltime=4:00:00 # # Request 1 gigabyte of memory per process # #PBS -l pmem=1gb # # Request that regular output and terminal output go to the same file # #PBS -j oe # # The following is the body of the script. By default, # PBS scripts execute in your home directory, not the # directory from which they were submitted. The following # line places you in the directory from which the job # was submitted. # cd $PBS_O_WORKDIR # # Now we want to run the program "hello". "hello" is in # the directory that this script is being submitted from, # $PBS_O_WORKDIR. # echo " " echo " " echo "Job started on `hostname` at `date`" ./hello echo " " echo "Job Ended at `date`" echo " "
- Submit the PBS Script to PBS for Execution
Once a PBS script is created, it needs to be submitted to PBS so that it becomes eligible to be run. The command to submit a script to PBS is called qsub. The syntax of qsub is:
qsub scriptfileThe following is an example of using qsub to submit a PBS script called myjob.
% qsub myjobThe job script myjob has just been submitted to PBS and has been assigned the Job_ID 95.lionxj.rcc.psu.edu. This Job_ID can later be used to control the job.
95.lionxj.rcc.psu.edu
Interactive Batch Jobs
Interactive PBS jobs are similar to non-interactive PBS jobs in that they are submitted to PBS via the command qsub. Submitting an interactive PBS job differs from a non-interactive PBS job in that a PBS script is not necessary. All PBS directives can be specified on the command line.The syntax for qsub for submitting an interactive PBS job is:
qsub -I ... pbs directives ...
The -I
flag above tells qsub that this is an interactive job. The following example shows using qsub to submit an interactive job using one processor on one node for four hours.lionxi:~$ qsub -I -l nodes=1:ppn=1 -l walltime=4:00:00 qsub: waiting for job 1064159.lionxi.rcc.psu.edu to start qsub: job 1064159.lionxi.rcc.psu.edu ready lionxi25:~$
-I
flag. Instead, it waits until the job is started and gives a prompt on the first compute node assigned to a job. The second thing of note is the prompt lionxi25:~$ - this shows that commands are now being executed on the compute node lionxi25.Checking Job Status
The command to check job status is qstat. qstat has many options. Some common ones are:A job can be in several different states. The most common ones are:
PBS Commands for Checking Job Status Command Name Description of Command Functionality qstat Shows the status of all PBS jobs. The time displayed is the CPU time used by the job. qstat -s Shows the status of all PBS jobs. The time displayed is the walltime used by the job. qstat -u userid Shows the status all PBS jobs submitted by the user userid. The time displayed is the walltime used by the job. qstat -n Shows the status all PBS jobs along with a list of compute nodes that the job is running on. qstat -f jobid Shows detailed information about the job jobid.
State | Meaning |
---|---|
Q | The job is queued and is waiting to start. |
R | The job is currently running. |
E | The job is currently ending. |
H | The job has a user or system hold on it and will not be eligible to run until the hold is removed. |
- Example: qstat output
lionxj:~$ qstat Job id Name User Time Use S Queue ------------- -------------- -------- -------- - ----- 10.lionxj sparse abc123 188:20:2 R lionxj 11.lionxj test jwh128 00:00:18 R lionxj-admin ...
- Job id: the job's unique indentifier
- Name: name of the job
- User: user that owns the job
- Time Use: CPU time used by the job
- S: state of the job
- Queue: the queue the job is in
- Example: qstat -s output
lionxj:~$ qstat -s lionxj.rcc.psu.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------- -------- -------- -------- ------ --- --- ------ ----- - ----- 10.lionxj.rcc abc123 lionxj sparse 5793 4 -- 2gb 190:0 R 189:2 -- 11.lionxj.rcc jwh128 lionxj-a test 11946 3 -- -- 500:0 R 166:5 -- ...
- Job id: the job's unique indentifier
- Username: user that owns the job
- Queue: the queue the job is in
- Jobname: the name of the job
- NDS: the number of compute nodes the job is using
- Req'd Memory: the memory requested for the job
- Req'd Time: the walltime requested for the job
- S: the state of the job
- Elap Time: the elapsed walltime for the job
Deleting a Job
The command to delete a job is qdel. Its syntax is "qdel Job_ID".
PBS Commands for Deleting Jobs Command Name Description of Command Functionality qdel Job_ID Deletes the job identified by Job_ID. qdel $(qselect -u username) Deletes all jobs belonging to user username.
- Example: deleting a job with Job_ID 10
lionxj:~$ qdel 10 - Example: deleting all jobs belonging to user abc123
lionxj:~$ qdel $(qselect -u abc123)
Viewing Job Output
By default PBS will write screen output from a job to the follwing files:If the PBS directive #PBS -j oe is used in a PBS script, the non-error and the error output are both written to the Jobname.oJob_ID file.
PBS Output Files Output File Name Contents of Output File Jobname.oJob_ID This file would contain the non-error output that would normally be written to the screen. Jobname.eJob_ID This file would contain the error output that would normally be written to the screen.
More Information
More information on PBS and PBS scripts can be found in the man pages for the commands qsub, pbs_resources, qstat, and qdel.Ref: https://rcc.its.psu.edu/user_guides/system_utilities/pbs/
No comments:
Post a Comment