|
Nimbus is a cluster of 33 dual-processor Opteron servers. The CPUs are Opteron 248s (2.2GHz clock speed) and each machine has 2Gb of RAM. They all run SuSE Linux 9.3.
Nimbus can only be used by sshing into the head node, whose external name is
nimbus.ch.cam.ac.uk. Almost all work is done from there. In
particular, passwords and shells should only be changed on the head node. Every
node has a name on the cluster's internal network of the form
comp??.ch.cam.ac.uk. The compute nodes can be logged into from the head node
but not from outside. You should almost never need to log into a compute node.
Nimbus's firewall is configured only to allow logins from Chemistry machines.
Homespace is on a disk array attached to the head node. The /home filesystem
is 200Gb in size. This is shared between all nodes so you see the same home
directory wherever you (or your job) are on the machine. /home is backed up
nightly and two weeks of incrementals are kept. There are quotas on /home. Currently the soft limit is 6Gb and the hard limit is 8Gb.
There is also a 900Gb shared filesystem called /sharedscratch in which you will
have a directory. This is not backed up. At the moment I have no plans to purge
it regularly, so please clean up old files when you're done with them. Please
try to use /sharedscratch appropriately: it should be used for data that you
could recreate in a reasonable time. Try to avoid writing to it over the
network where it's avoidable, ie from a running serial job. Each node also has
a local /scratch filesystem on which you will have your own directory. These
filesystems are about 50Gb in size with no quota restriction and are the most
appropriate place for your jobs to write temporary files during a run when this
is possible. They are local to each node and so considerably faster than the
NFS-mounted /home and /sharedscratch. Please clean up files on /scratch when
you are done with them; see the queueing documentation for how to find out
which node's /scratch to look at for each job.
The following software is installed on all nodes in addition to a very
minimal Linux installation:
- Intel Fortran and C 8.x (ifort/icc) in 32 and 64 bit versions
- Intel Fortran and C 9.0 (ifort/icc) in 32 and 64 bit versions
- Intel Math Kernel Library 7.2.1 in 32 and 64 bit versions
- Portland Group Fortran (pgf90, pgf77) and C compilers (pgcc) releases 6.1, 6.0 and 5.2 in 32 and 64 bit versions
- GNU compilers (gcc, g77)
- SCore parallel environment
- LAM-MPI parallel environment
- MPI-CH parallel environment
In order to manage all the combinations, the modules environment is installed. By default the 64-bit Portland 6.0-2, 64-bit ifort 8.1.026, and SCore modules are loaded, as these are likely to be the most popular.
The head node also has some extra software packages, such as popular
editors, as it is intended for interactive work. If there is a package missing
from the head node that you would like to use then please ask; it will probably
be possible to install it provided it is a sensible size.
The two parallel environments on the system are dealt with in a separate document. SCore is the default as it generally outperforms LAM.
All compute jobs should be run through the queueing system as there is an interactive CPU
time limit on the nodes. The queueing system will run each job on a free compute node,
copying the output back to a user-specified file at the end of the job. The only reason you
would log into a compute node is to clean up old scratch files. The queueing system is Torque with the Maui scheduler. Torque is almost identical to OpenPBS from the user's point of view, so the system will be familiar to anyone who has used the local Athens, Rama, Sword, or Destiny clusters. The main difference here is the queue names. Read the OpenPBS/Maui introduction for how to submit jobs, and Nimbus's queue setup to see what queues are available on this particular machine.
Problems with Nimbus should be reported to <cen1001@cam.ac.uk> in the first instance.
|