Search A-Z index Help
University of Cambridge Home Chemistry Dept Home CUC3 home
University of Cambridge > Department of Chemistry > Theoretical Chemistry

Putting Maui on the PBS clusters

Maui is a sophisticated scheduler that can act as a drop-in replacement for PBS's native scheduling daemon. It has many useful features that are not in PBS such as job reservations, throttling policies, and very flexible prioritization.

You can get the source from http://www.supercluster.org. It compiled out of the box on Redhat 9 and Debian 3. You do have to compile it as the user it is going to run as, apparently, so I created a 'maui' user with a disabled password. It is a bit like PBS in that it keeps its config and logs in the same directory by default, so on the rama cluster I created /var/maui for it.

The first thing to do before starting Maui is to edit /var/maui/maui.cfg and set the server mode from NORMAL to TEST. This allows Maui to pull data from PBS and make dummy scheduling decisions. You can therefore get the settings completely right before letting it schedule jobs for real.

Maui directives are given in the form

PARAMETER VALUE [VALUE]
You can change them on-the-fly with the changeparam command, which is extremely helpful. However watch out for the shell mangling your arguments. I find it's safest to quote everything. showconfig will print out the current scheduler configuration.

One problem I had with Maui and PBS was that one of my PBS servers uses a patch to force a 'required property' onto each an every job. This is used to make sure that serial jobs ask for one processor only, and parallel jobs for the appropriate number but making sure they get both processors on each assigned node. The syntax for this is something like -l nodes=1 or -l nodes=2#ppn=2 and Maui inteprets this as your asking for a node with the property '1' or '2'. This is simple enough to get round- just add '1' to the PBS properties of each node. On more recent machines I have dropped this patch and haven't had a problem.

You start Maui by su'ing to maui and running /usr/local/sbin/maui and you stop it with schedctl -k. HUPing it doesn't seem to make it reread the config file, and some things don't quite work with 'changeparam' and you have to restart it.

The docs for Maui don't come with the source but are available online. However, for the benefit of my users I have mangled some of the online docs into local manpages. These are installed in /usr/local/man on rama. The Admin Guide is available in a PDF version, though it is a bit out of date when compared with the HTML, and this is in my office in the PBS folder.

My current scheduling policy is fairly simple. I avoid queue stuffing by setting:

USERCFG[DEFAULT] MAXIJOB=4
JOBPRIOACCRUALPOLICY            FULLPOLICY
ie only four idle jobs per user. The JOBPRIOACCRUAL setting means that time spent in the Blocked state (ie when a user has excess idle jobs) doesn't count towards priority.

Backfill is optimzied buy making rservations for the top two queued jobs, ie

RESERVATIONDEPTH 2

The priority is currently set thus:

QUEUETIMEWEIGHT       1 
FSWEIGHT              1
FSUSERWEIGHT          1
XFACTORWEIGHT         100 # has to be quite big because queue time gets big

FSPOLICY              PSEDEDICATED
FSDEPTH               7
FSINTERVAL            86400
FSDECAY               0.80

Since setting the policy above I have discovered a few other things that need to be set or are useful.

# Required so that if a node goes down, Maui doesn't keep trying to kill
# the job on it when it hits its walltime limit and then mailing the user to
# tell them it failed
JOBMAXOVERRUN INFINITY
# Allow neg prio
ENABLENEGJOBPRIORITY TRUE
# Restrict running job numbers in longer classes
CLASSCFG[serial_long_1gb] MAXPROCPERUSER=16
CLASSCFG[par_long_2proc] MAXPROCPERUSER=16
# This one improves utlization on machines with a mix of wide and narrow
# parallel jobs. It is great on athens and not required on sword or rama. 
# What it does is makes the scheduler try to optimize jobs for the 'best
# fit' space rather than the 'first fit', all other things being equal
NODEALLOCATIONPOLICY           LASTAVAILABLE
# This one reserves a node for a 'test' class
SRCFG[testres] HOSTLIST=comp00.ch.cam.ac.uk PERIOD=INFINITY CLASSLIST=test