|
The queueing system on most of the Theory sector clusters is OpenPBS or Torque with the Maui scheduler. Torque developed from OpenPBS, so the commands are the same. The rest of this document will talk about PBS, but is applicable to the Torque clusters.
You need three main commands to use the system: showq, qsub, and qstat.
showq
This is a Maui command which displays the status of the queueing system. It shows three types
of job: those running, those queued, and those blocked for various
reasons. The time by each job is the walltime left before the job
finishes. For queued jobs it is the walltime before the earliest
calculated start time for the job, based on the jobs currently in the
system. This estimate is liable to be inaccurate because it is based on
wallclock times requested by running jobs, and often jobs finish early.
qstat and checkjob
qstat is a PBS command that queries PBS configuration.
'qstat -q' shows you the available queues. Note that 'qstat -q'
doesn't give you all of the interesting information about any given
queue. 'qstat -Q' gives a different display for each queue. If you
want to know everything then 'qstat -Qf queuename' will give you all
of the user-readable information about the queue called queuename.
'qstat -f jobid' gives a full listing of the queue details for the
job whose number is jobid. This is useful for debugging; when a job
won't run or is running in the wrong place. If the job isn't running then
there will usually be a comment in the -f output explaining why.
Another useful command for this is 'checkjob jobid'.
'qstat -n jobid' tells you which node(s) the job with number jobid is running on. This is
useful to know if you are writing to non-shared filesystems. However you can
make the job write this information to your output file too; see later.
'qstat' with no options shows you the PBS server's view of the current state of the job queues. This contains less useful information than the 'showq' command (which is the Maui scheduler's view) and so should only be used for debugging problems. It can actually be misleading because PBS cannot know about blocked jobs.
qsub
This is a PBS command.
qsub submits a job. In its simplest form you would run
'qsub myscript', where myscript is a shell script that runs your job.
It must be a script and not a binary. Your shell script
will be interpreted by your login shell- any #! line at the top is
ignored. You can force PBS to use a different shell with the -S qsub
option.
qsub has lots of commandline options, but you can make things easier
by setting most of the available options within your job script. Here's a very short example job script to look at:
# Set some PBS options: queue name and max time for the job
#PBS -q serial
#PBS -l walltime=2:00:00
# Change to directory you submitted the job from
cd $PBS_O_WORKDIR
# Run program
/home/cen1001/myprog input.file
All of the clusters have detailed example submit scripts in their /info/pbs directories. You should take an example script from the cluster you want to use and edit it instead of using this one; some things vary from machine to machine.
For those who don't want to write scripts, you can do an interactive
qsub with the -I switch. In this case you'd need to
give the queue on the commandline, for example
qsub -I -q l2
This opens a session on the node assigned to the job. It looks
like a remote login session, except that the initialisation isn't
exactly the same. You can then run commands interactively on your
assigned node until your walltime runs out. If you submit to a queue
which assigns you multiple nodes you can examine the contents of the
PBS nodefile ($PBS_NODEFILE) to see which other nodes you may use.
It is generally much better to script jobs
than to run interactively, as they can be restarted automatically if the node fails, and it doesn't matter if there isn't a node free exactly when you need one.
Other useful commands
Both Maui and PBS come with a whole suite of commands. PBS commands generally start with 'q' and are used for things like querying the queue setup and manipulating jobs in the queue. Maui commands have no common naming scheme. They tend to be for displaying information about when jobs will start or end.
qdel, qhold, and qrls will delete, hold, and release a job
respectively. qalter will let you change certain parameters of a
queued or running job. To find other PBS or Torque commands, do 'man -k pbs'. Other
Maui commands are 'showbf', 'showres', and 'diagnose'. Maui does
not come with man pages; the available ones on the system have been
created from the online HTML documentation so errors and omissions are
the fault of cen1001 and not the Maui authors. The online docs are
at http://www.supercluster.org/documentation/.
Parallel jobs
Some of the clusters (Rama, Athens, Nimbus, Clust) have parallel queues. There are example submit scripts in the /info/pbs directory on those machines for the different libraries they support. The only generalization I can make here is that starting your job with mpirun inside your job script probably won't do what you want! Please read the documentation at http://www-theor.ch.cam.ac.uk/IT/servers/ for the machine, which should cover it.
Scheduling policy
The scheduling policy on the machines is roughly FIFO but with some extra Maui rules to make it fairer. Three features are used: the ability to
make reservations for queued jobs, the throttling rules, and the
flexible priority system.
The queue is first sorted on priority. The priority is made up of several weighted
components: time on the queue (subject to throttling- see below),
fairshare value, and job expansion factor. The scheduler then starts at
the top of the queue and starts jobs until it reaches one that cannot run
yet because there are not enough free nodes. The top one or two remaining queued
jobs then have reservations made for them; this means that the scheduler
works out the earliest those jobs can definitely start and will not schedule anything that could possibly delay them. Finally the scheduler looks further down the queue and tries to
backfill lower priority jobs around the reservations.
Throttling solves the problem of a large batch of jobs from
one user crowding everyone else out. It limits the number of jobs that
any user may have being queued at any time to four. Excess jobs are
placed in the Blocked state and won't be released until one of the
user's four queued jobs has started. While a job is blocked its time on
the queue is counted as zero, so it gets no priority gain from waiting in
a blocked state. There are also per-user running job limits on some job types- see later.
Fairshare
value contributes to priority. It is a measure of how much CPU time a user has had lately. If they
are over the configured value
then their priority decreases relative to other users, and if they are under then it increases.
The job expansion factor also counts towards priority, and helps short jobs when the machine is full. It
is calculated as (1 + time on the queue / wall clock limit for job).
This factor increases much more rapidly for short jobs than long jobs.
Note that if a job gets blocked by the throttling rule then time spent
blocked doesn't count towards the total time on the queue.
Finally some queues have per-user job number limits on them. If you find your jobs go into the Blocked state for no reason, use diagnose -q to see what is going on. It is probably a per-user job limit on the job class.
|