Virtually the best blog on the web!
Posts tagged Cluster
Checkpointing and Prelinking
Jul 29th
I am using blcr (a kernel level checkpointing facility) for my research and it’s quite cool. You don’t have to modify your application, and it’s used as follows
cr_run my_app cr_checkpoint --term my_app_pid # creates a contex.pid file and kills the process cr_restart context.pid # viola ! start from where it was checkpointed
In theory, you can move the checkpoint files to another machine with the same kernel, but I was experiencing segfaults. I contacted one of the developers Paul (He is a nice guy, he’s been helping me a lot with patches etc.), and he told me that the libraries on both machines should be the same and should load to the same addresses. The problem is that prelink is messing up the libraries, and obviously the libraries on both machines were not the same. So, I ran a prelink -u, to undo the prelinking, and checkpoint/restart is working !
Btw, I am working on a fault-tolerant (ft) scheduler for large-scale systems. It’s still a pipe dream. Currently, I have blcr integrated with Torque+Maui, and I can checkpoint/restart serial jobs with my ft scheduler. I am working on mpi jobs and hopefully, I will have some stuff out to the public by end of september.
Building a Quick and Dirty Linux Cluster with Torque and Maui
Jul 6th
Building production-grade clusters is a difficult task and entire books are written about this topic. There are various cluster building toolkits like OSCAR and Rocks that can be used to build large clusters. I had to build a small cluster for my research and I wanted to build it as quickly as possible instead of getting mired with configuring a ton of packages. Here’s how I have done it.
Software
So, we need
- A resource manager – I am using Torque, an improved version of OpenPBS (open source variant of PBS)
- A scheduler – Torque comes with a default scheduler, but that is too simple for my needs. I am using Maui, a popular cluster scheduler. Maui integrates nicely with Torque.
- An MPI package – I am using LAM/MPI as it is integrated with blcr (checkpoint/restart tools). I need blcr for my fault tolerance research.
I have chosen FC3 (Fedora Core 3) as the distribution for all the machines in the cluster. It doesn’t matter much as we will be building all the above tools from source. Note that blcr is still experimental for 2.6 kernels.
Head node setup
We have to setup NIS, NFS and Torque server component on the head node. NIS HOWTO and NFS HOWTO should help you setup the NIS, NFS servers. The basic idea is to setup the authentication and home directories on a central server.
Use a mountable home directory for compiling the tools. This will make it easiler to compile once and install on multiple nodes.
Setup Torque as follows
- Download Torque
wget http://www.clusterresources.com/downloads/torque/torque-1.2.0p4.tar.gz
- Compile and install as root
./configure make su make install
Compute node setup
On a compute node, we only have to install Torque’s client component.
- Install client components
cd src/resmom su make install
Configuring client nodes
Create the file in /usr/spool/PBS/mom_priv/config and add the following lines
$clienthost <head node ip> $logevent 255 $restricted <head node ip>
Note that you have to specify the ip address. Specifying a host name will not work.
Configuring head node
In the unzipped Torque directory, run
./torque.setup <admin user>
You can use any user name that can log in to the head node as the admin. I prefer root.
Add the compute nodes to /usr/spool/PBS/server_priv/nodes file. You can just list them by hostnames. For example,
cnode1.testbed cnode2.testbed ...
Now, run the following commands as the admin user
qterm -t quick
Setting up the scheduler
- Download, compile and install maui
wget http://www.clusterresources.com/downloads/maui/maui-3.2.6p13.tar.gz tar zxvf maui-3.2.6p13.tar.gz ./configure make make install (as root)
Maui is integrated with Torque, so no specific configuration is needed.
Run the servers
On the head node
pbs_server maui
On the client nodes
pbs_mom
You can use the following init scripts for easy start/restart of the servers.
torque.client (start/stop/restart pbs_mom)
#!/bin/sh
# chkconfig: 2345 30 80
# description: TORQUE is a scalable resource manager which manages jobs in # cluster environments.
# Source function library.
if [ -f /etc/init.d/functions ] ; then
. /etc/init.d/functions
elif [ -f /etc/rc.d/init.d/functions ] ; then
. /etc/rc.d/init.d/functions
else
exit 0
fi
PREFIX=/usr/local
# Read in the command arguments
case "$1" in
start)
# Start TORQUE services first...
echo -n $"Starting TORQUE services: "
daemon $PREFIX/sbin/pbs_mom
echo
;;
stop)
# Stop Moab first...
echo -n $"Shutting down TORQUE services: "
killproc pbs_mom
echo
;;
restart)
$0 stop
$0 start
;;
*)
echo $"Usage: torque {start|stop|restart}"
exit 1
esac
exit 0
torque.server (start/restart/stop pbs_server and maui)
#!/bin/sh
# chkconfig: 2345 30 80
# description: TORQUE is a scalable resource manager which manages jobs in # cluster environments.
# Source function library.
if [ -f /etc/init.d/functions ] ; then
. /etc/init.d/functions
elif [ -f /etc/rc.d/init.d/functions ] ; then
. /etc/rc.d/init.d/functions
else
exit 0
fi
PREFIX=/usr/local
# Read in the command arguments
case "$1" in
start)
# Start TORQUE services first...
echo -n $"Starting TORQUE services: "
daemon $PREFIX/sbin/pbs_server
echo
# Next start Moab scheduler...
echo -n $"Starting Maui scheduler: "
daemon $PREFIX/maui/sbin/maui
echo
;;
stop)
# Stop Moab first...
echo -n $"Shutting down Maui scheduler: "
killproc maui
echo
echo -n $"Shutting down TORQUE services: "
killproc pbs_server
echo
;;
restart)
$0 stop
$0 start
;;
*)
echo $"Usage: torque {start|stop|restart}"
exit 1
esac
exit 0
You may have to change the prefixes depending on where you installed Torque and Maui.
Install MPI
Get the source from LAM/MPI and install it on all the machines. configuration details coming soon.
Test the cluster
Once the servers are running, you are ready to test the cluster. Run
pbsnodes -a
to see the status of all nodes.
Submit a job using qsub.
Resources
There are many more issues involved in running a cluster. This article should give you a head start and get you up and running with a small cluster. Check the following resources for more details.
- Building Linux Clusters by David HM Spector
- Torque resource manager
- Maui cluster scheduler