Search A-Z index Help
University of Cambridge Home Chemistry Dept Home CUC3 home
University of Cambridge > Department of Chemistry > Theoretical Chemistry > Computer Support

User Guide to OpenPBS/Maui on Theory sector clusters

The queueing system on most of the Theory sector clusters is OpenPBS or Torque with the Maui scheduler. Torque developed from OpenPBS, so the commands are the same. The rest of this document will talk about PBS, but is applicable to the Torque clusters.

You need three main commands to use the system: showq, qsub, and qstat.

showq

This is a Maui command which displays the status of the queueing system. It shows three types of job: those running, those queued, and those blocked for various reasons. The time by each job is the walltime left before the job finishes. For queued jobs it is the walltime before the earliest calculated start time for the job, based on the jobs currently in the system. This estimate is liable to be inaccurate because it is based on wallclock times requested by running jobs, and often jobs finish early.

qstat and checkjob

qstat is a PBS command that queries PBS configuration.

'qstat -q' shows you the available queues. Note that 'qstat -q' doesn't give you all of the interesting information about any given queue. 'qstat -Q' gives a different display for each queue. If you want to know everything then 'qstat -Qf queuename' will give you all of the user-readable information about the queue called queuename.

'qstat -f jobid' gives a full listing of the queue details for the job whose number is jobid. This is useful for debugging; when a job won't run or is running in the wrong place. If the job isn't running then there will usually be a comment in the -f output explaining why. Another useful command for this is 'checkjob jobid'.

'qstat -n jobid' tells you which node(s) the job with number jobid is running on. This is useful to know if you are writing to non-shared filesystems. However you can make the job write this information to your output file too; see later.

'qstat' with no options shows you the PBS server's view of the current state of the job queues. This contains less useful information than the 'showq' command (which is the Maui scheduler's view) and so should only be used for debugging problems. It can actually be misleading because PBS cannot know about blocked jobs.

qsub

This is a PBS command.

qsub submits a job. In its simplest form you would run 'qsub myscript', where myscript is a shell script that runs your job. It must be a script and not a binary. Your shell script will be interpreted by your login shell- any #! line at the top is ignored. You can force PBS to use a different shell with the -S qsub option.

qsub has lots of commandline options, but you can make things easier by setting most of the available options within your job script. Here's a very short example job script to look at:

# Set some PBS options: queue name and max time for the job
#PBS -q serial
#PBS -l walltime=2:00:00

# Change to directory you submitted the job from
cd $PBS_O_WORKDIR

# Run program
/home/cen1001/myprog input.file 

All of the clusters have detailed example submit scripts in their /info/pbs directories. You should take an example script from the cluster you want to use and edit it instead of using this one; some things vary from machine to machine.

For those who don't want to write scripts, you can do an interactive qsub with the -I switch. In this case you'd need to give the queue on the commandline, for example

    qsub -I -q l2
    
This opens a session on the node assigned to the job. It looks like a remote login session, except that the initialisation isn't exactly the same. You can then run commands interactively on your assigned node until your walltime runs out. If you submit to a queue which assigns you multiple nodes you can examine the contents of the PBS nodefile ($PBS_NODEFILE) to see which other nodes you may use. It is generally much better to script jobs than to run interactively, as they can be restarted automatically if the node fails, and it doesn't matter if there isn't a node free exactly when you need one.

Other useful commands

Both Maui and PBS come with a whole suite of commands. PBS commands generally start with 'q' and are used for things like querying the queue setup and manipulating jobs in the queue. Maui commands have no common naming scheme. They tend to be for displaying information about when jobs will start or end.

qdel, qhold, and qrls will delete, hold, and release a job respectively. qalter will let you change certain parameters of a queued or running job. To find other PBS or Torque commands, do 'man -k pbs'. Other Maui commands are 'showbf', 'showres', and 'diagnose'. Maui does not come with man pages; the available ones on the system have been created from the online HTML documentation so errors and omissions are the fault of cen1001 and not the Maui authors. The online docs are at http://www.supercluster.org/documentation/.

Parallel jobs

Some of the clusters (Rama, Athens, Nimbus, Clust) have parallel queues. There are example submit scripts in the /info/pbs directory on those machines for the different libraries they support. The only generalization I can make here is that starting your job with mpirun inside your job script probably won't do what you want! Please read the documentation at http://www-theor.ch.cam.ac.uk/IT/servers/ for the machine, which should cover it.

Scheduling policy

The scheduling policy on the machines is roughly FIFO but with some extra Maui rules to make it fairer. Three features are used: the ability to make reservations for queued jobs, the throttling rules, and the flexible priority system.

The queue is first sorted on priority. The priority is made up of several weighted components: time on the queue (subject to throttling- see below), fairshare value, and job expansion factor. The scheduler then starts at the top of the queue and starts jobs until it reaches one that cannot run yet because there are not enough free nodes. The top one or two remaining queued jobs then have reservations made for them; this means that the scheduler works out the earliest those jobs can definitely start and will not schedule anything that could possibly delay them. Finally the scheduler looks further down the queue and tries to backfill lower priority jobs around the reservations.

Throttling solves the problem of a large batch of jobs from one user crowding everyone else out. It limits the number of jobs that any user may have being queued at any time to four. Excess jobs are placed in the Blocked state and won't be released until one of the user's four queued jobs has started. While a job is blocked its time on the queue is counted as zero, so it gets no priority gain from waiting in a blocked state. There are also per-user running job limits on some job types- see later.

Fairshare value contributes to priority. It is a measure of how much CPU time a user has had lately. If they are over the configured value then their priority decreases relative to other users, and if they are under then it increases.

The job expansion factor also counts towards priority, and helps short jobs when the machine is full. It is calculated as (1 + time on the queue / wall clock limit for job). This factor increases much more rapidly for short jobs than long jobs. Note that if a job gets blocked by the throttling rule then time spent blocked doesn't count towards the total time on the queue.

Finally some queues have per-user job number limits on them. If you find your jobs go into the Blocked state for no reason, use diagnose -q to see what is going on. It is probably a per-user job limit on the job class.