Thursday, August 14, 2014

Using TORQUE to Submit and Monitor Jobs

PBS Shell Commands

When you submit and monitor jobs on compute clusters in the RCC, you are using TORQUE - a version of PBS.
A popular open-source resource manager, TORQUE is used at thousands of research sites globally. With the commands available through TORQUE you can allocate resources, schedule and manage execution, monitor and view the status of your jobs.
Some of TORQUE's commands are used at the shell command line, others are embedded in the shell script that runs your program. Here are the commonly used shell commands.

Frequently Used Shell Commands

Basic Usage

Example

qsubsubmit a pbs jobqsub [script]$ qsub job.pbs
qstatshow status of pbs batch jobsqstat [job_id]$ qstat 44
qdeldelete pbs batch jobqdel [job_id]$ qdel 44
qholdhold pbs batch jobsqhold [job_id]$ qhold 44
qrlsrelease hold on pbs batch jobsqrls [job_id]$ qrls 44

Shell Commands to Check Queue and Job Status

qstat -qlist all queues
qstat -alist all jobs
qstat -au useridlist jobs for userid
qstat -rlist running jobs
qstat -f job_idlist full information about job_id
qstat -Qf queuelist full information about queue
qstat -Blist summary status of the job server
pbsnodeslist status of all compute nodes

For complete documentation on the commands listed here, refer to the online man page: type man command at the shell prompt.

#PBS in your Job Script

The best way to control execution of your job is through the use of #PBS commands embedded in the job script. The job script is any shell script you normally run to execute your programs. The #PBS commands appear to be comments to the shell but when your script is submitted to the PBS job scheduler (via the qsub command), they can alter job attributes and select scheduler options.
Basic #PBS commands
#PBS -N myjobSet the job name
#PBS -m aeMail status when the job completes
#PBS -M your@email.addressMail to this address
#PBS -l nodes=4Allocate specified number of nodes
#PBS -l walltime=1:00:00Inform the PBS scheduler of the expected runtime

Download a PBS Template Script

Download an example script that includes descriptive comments: Example PBS Job Script


# Sample PBS job script
#
# Copy this script, customize it and then submit it with the ``qsub''
# command. For example:
#
# cp pbs-template.sh myjob-pbs.sh
# {emacs|vi} myjob-pbs.sh
# qsub myjob-pbs.sh
#
# PBS directives are fully documented in the ``qsub'' man page. Directives
# may be specified on the ``qsub'' command line, or embedded in the
# job script.
#
# For example, if you want the batch job to inherit all your environment
# variables, use the ``V'' switch when you submit the job script:
#
# qsub -V myjob-pbs.sh
#
# or uncomment the following line by removing the initial ``###''
### #PBS -V

# Note: group all PBS directives at the beginning of your script.
# Any directives placed after the first shell command will be ignored.

### Set the job name
#PBS -N myjob

### Run in the queue named "batch"
#PBS -q batch

### Use the bourne shell
#PBS -S /bin/sh

### Remove only the three initial "#" characters before #PBS
### in the following lines to enable:
###
### To send email when the job is completed:
### #PBS -m ae
### #PBS -M your@email.address

### Optionally set the destination for your program's output
### Specify localhost and an NFS filesystem to prevent file copy errors.
### #PBS -e localhost:$HOME/myjob.err
### #PBS -o localhost:$HOME/myjob.log

### Specify the number of cpus for your job.  This example will allocate 4 cores
### using 2 processors on each of 2 nodes.
### #PBS -l nodes=2:ppn=2

### Tell PBS how much memory you expect to use. Use units of 'b','kb', 'mb' or 'gb'.
### #PBS -l mem=256m

### Tell PBS the anticipated run-time for your job, where walltime=HH:MM:SS
### #PBS -l walltime=1:00:00

### Switch to the working directory; by default TORQUE launches processes
### from your home directory.
cd $PBS_O_WORKDIR
echo Working directory is $PBS_O_WORKDIR

# Calculate the number of processors allocated to this run.
NPROCS=`wc -l < $PBS_NODEFILE`

# Calculate the number of nodes allocated.
NNODES=`uniq $PBS_NODEFILE | wc -l`

### Display the job context
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo Using ${NPROCS} processors across ${NNODES} nodes

### OpenMPI will automatically launch processes on all allocated nodes.
## MPIRUN=`which mpirun`
## ${MPIRUN} -machinefile $PBS_NODEFILE -np ${NPROCS} my-openmpi-program

### Or, just run your serial program
## $HOME/my-program


# PBS environment variables available in every batch job:
#
# $PBS_ENVIRONMENT set to PBS_BATCH to indicate that the job is a batch job; otherwise,
#                  set to PBS_INTERACTIVE to indicate that the job is a PBS interactive job
# $PBS_JOBID       the job identifier assigned to the job by the batch system
# $PBS_JOBNAME     the job name supplied by the user
# $PBS_NODEFILE    the name of the file that contains the list of nodes assigned to the job
# $PBS_QUEUE       the name of the queue from which the job is executed
# $PBS_O_HOME      value of the HOME variable in the environment in which qsub was executed
# $PBS_O_LANG      value of the LANG variable in the environment in which qsub was executed
# $PBS_O_LOGNAME   value of the LOGNAME variable in the environment in which qsub was executed
# $PBS_O_PATH      value of the PATH variable in the environment in which qsub was executed
# $PBS_O_MAIL      value of the MAIL variable in the environment in which qsub was executed
# $PBS_O_SHELL     value of the SHELL variable in the environment in which qsub was executed
# $PBS_O_TZ        value of the TZ variable in the environment in which qsub was executed
# $PBS_O_HOST      the name of the host upon which the qsub command is running
# $PBS_O_QUEUE     the name of the original queue to which the job was submitted
# $PBS_O_WORKDIR   the absolute path of the current working directory of the qsub command
#
# End of example PBS script


No comments:

Post a Comment