IRIDIA cluster maintenance
This page contains information on maintenance of the cluster. This includes installing new software, add/removing nodes, security etc.
Adding a new diskless node
In order to have a new fully functional client, the server must first configured to allow the server to boot from the net. Then, the new client must be added to the client list of \sge. The actual client kernel assumes that the client has a Inter PRO 1000 card. At the moment, other cards require a re-compilation of the kernel and other modifications to the net-booting process.
- switch the client on while it is attached to a keyboard and a monitor;
- enter the BIOS and configure the client in order not to stop when keyboard, video card, floppy, or whatever else is missing;
- configure it to boot from LAN;
- let it boot and, if it appears, write down the MAC address of the network card; switch it off otherwise.
Finding the MAC address of a new client
The MAC address is a sequence of 12 hexadecimal digits, normally coupled and each couple separeted by a ``: or a space. If you do not have it, you can get it in this way:
On the server, type the following:
tail -f /var/log/daemon.log
Switch on the client and let it boot from the network (it will fail). Now look at the server's screen: it will appear a line like:
DHCPDISCOVER from 00:13:16:69:71:fa via eth1
the numbers between from and via are the MAC address.
Next, the final steps. Let's say that the MAC address is 00:13:16:69:71:fa, the new host name will be p69 and its IP address will be 192.168.100.69. Then, on the server edit the file
/etc/dhcpd.conf
Search for the block where the other nodes are defined, looking for instance for "host p02" and add the following after the last definition of the group:
host p69 { hardware ethernet 00:13:16:69:71:fa; fixed-address 192.168.100.69; }
Execute
/etc/init.d/dhcp restart
Add the new host in /etc/hosts
... 192.168.100.69 p69 ...
Add the new data to export the filesistem in /etc/exports:
/var/lib/diskless/default/192.168.100.69/etc 192.168.100.69(rw,no_root_squash) /var/lib/diskless/default/192.168.100.69/rw 192.168.100.69(ro,no_root_squash) /var/lib/diskless/default/192.168.100.69/rw-secure 192.168.100.69(rw,no_root_squash)
Restart the NFS server:
/etc/init.d/nfs-kernel-server restart
And finally, execute:
update-host-directories
Then the host must be included in the Sun Grid Engine. Read and follow the instructions Of the Sun ONE Grid Engine Administration and User's Guide, Chapter 2 ``How to Install Execution Host. A copy of the guide can be found on the server in the file
/usr/local/sge/doc/SGE53AdminUserDoc.pdf