|
Maui is a sophisticated scheduler that can act as a drop-in replacement
for PBS's native scheduling daemon. It has many useful features that are
not in PBS such as job reservations, throttling policies, and very
flexible prioritization.
You can get the source from http://www.supercluster.org. It
compiled out of the box on Redhat 9 and Debian 3. You do have to compile
it as the user it is going to run as, apparently, so I created a 'maui'
user with a disabled password. It is a bit like PBS in that it keeps its
config and logs in the same directory by default, so on the rama cluster I created /var/maui for
it.
The first thing to do before starting Maui is to edit
/var/maui/maui.cfg and set the server mode from NORMAL
to TEST. This allows Maui to pull data from PBS and make dummy
scheduling decisions. You can therefore get the settings completely right
before letting it schedule jobs for real.
Maui directives are given in the form
PARAMETER VALUE [VALUE]
You can change them on-the-fly with the changeparam command,
which is extremely helpful. However watch out for the shell mangling your
arguments. I find it's safest to quote everything. showconfig
will print out the current scheduler configuration.
One problem I had with Maui and PBS was that one of my PBS servers uses a patch to
force a 'required property' onto each an every job. This is used to make
sure that serial jobs ask for one processor only, and parallel jobs for the
appropriate number but making sure they get both processors on each
assigned node. The syntax for this is something like -l
nodes=1 or -l nodes=2#ppn=2 and Maui inteprets this as your asking for a node with the
property '1' or '2'. This is simple enough to get round- just add '1' to the PBS
properties of each node. On more recent machines I have dropped this patch
and haven't had a problem.
You start Maui by su'ing to maui and running /usr/local/sbin/maui
and you stop it with schedctl -k. HUPing it doesn't seem to make
it reread the config file, and some things don't quite work with
'changeparam' and you have to restart it.
The docs for Maui don't come with the source but are available
online. However, for the benefit of my users I have mangled some of the
online docs into local manpages. These are installed in /usr/local/man on
rama. The Admin Guide is available in a PDF version, though it is a bit
out of date when compared with the HTML, and this is in my office in the
PBS folder.
My current scheduling policy is fairly simple. I avoid queue stuffing by
setting:
USERCFG[DEFAULT] MAXIJOB=4
JOBPRIOACCRUALPOLICY FULLPOLICY
ie only four idle jobs per user. The JOBPRIOACCRUAL setting means that
time spent in the Blocked state (ie when a user has excess idle jobs)
doesn't count towards priority.
Backfill is optimzied buy making rservations for the top two queued jobs, ie
RESERVATIONDEPTH 2
The priority is currently set thus:
QUEUETIMEWEIGHT 1
FSWEIGHT 1
FSUSERWEIGHT 1
XFACTORWEIGHT 100 # has to be quite big because queue time gets big
FSPOLICY PSEDEDICATED
FSDEPTH 7
FSINTERVAL 86400
FSDECAY 0.80
Since setting the policy above I have discovered a few other things that
need to be set or are useful.
# Required so that if a node goes down, Maui doesn't keep trying to kill
# the job on it when it hits its walltime limit and then mailing the user to
# tell them it failed
JOBMAXOVERRUN INFINITY
# Allow neg prio
ENABLENEGJOBPRIORITY TRUE
# Restrict running job numbers in longer classes
CLASSCFG[serial_long_1gb] MAXPROCPERUSER=16
CLASSCFG[par_long_2proc] MAXPROCPERUSER=16
# This one improves utlization on machines with a mix of wide and narrow
# parallel jobs. It is great on athens and not required on sword or rama.
# What it does is makes the scheduler try to optimize jobs for the 'best
# fit' space rather than the 'first fit', all other things being equal
NODEALLOCATIONPOLICY LASTAVAILABLE
# This one reserves a node for a 'test' class
SRCFG[testres] HOSTLIST=comp00.ch.cam.ac.uk PERIOD=INFINITY CLASSLIST=test
|