IRIDIA cluster maintenance

From IridiaWiki
Revision as of 16:05, 17 February 2005 by Christensen (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This page contains information on maintenance of the cluster. This includes installing new software, add/removing nodes, security etc.

Adding a new diskless node

In order to have a new fully functional client, the server must first configured to allow the server to boot from the net. Then, the new client must be added to the client list of \sge. The actual client kernel assumes that the client has a Inter PRO 1000 card. At the moment, other cards require a re-compilation of the kernel and other modifications to the net-booting process.

  1. switch the client on while it is attached to a keyboard and a monitor;
  2. enter the BIOS and configure the client in order not to stop when keyboard, video card, floppy, or whatever else is missing;
  3. configure it to boot from LAN;
  4. let it boot and, if it appears, write down the MAC address of the network card; switch it off otherwise.

=== Finding the MAC address of a new client: The MAC address is a sequence of 12 hexadecimal digits, normally coupled and each couple separeted by a ``: or a space. If you do not have it, you can get it in this way:

On the server, type the following:

tail -f /var/log/daemon.log

Switch on the client and let it boot from the network (it will fail). Now look at the server's screen: it will appear a line like:

DHCPDISCOVER from 00:13:16:69:71:fa via eth1

the numbers between from and via are the MAC address.

Next, the final steps. Let's say that the MAC address is 00:13:16:69:71:fa, the new host name will be p69 and its IP address will be 192.168.100.69. Then, on the server edit the file

/etc/dhcpd.conf

Search for the block where the other nodes are defined, looking for instance for "host p02" and add the following after the last definition of the group:

host p69 {
        hardware ethernet 00:13:16:69:71:fa;
        fixed-address 192.168.100.69;
}

Execute

/etc/init.d/dhcp restart

Add the new host in /etc/hosts

...
192.168.100.69  p69
...

Add the new data to export the filesistem in /etc/exports:

/var/lib/diskless/default/192.168.100.69/etc 192.168.100.69(rw,no_root_squash)
/var/lib/diskless/default/192.168.100.69/rw 192.168.100.69(ro,no_root_squash)
/var/lib/diskless/default/192.168.100.69/rw-secure 192.168.100.69(rw,no_root_squash)

Restart the NFS server:

/etc/init.d/nfs-kernel-server restart

And finally, execute:

update-host-directories

Then the host must be included in the Sun Grid Engine. Read and follow the instructions Of the Sun ONE Grid Engine Administration and User's Guide, Chapter 2 ``How to Install Execution Host. A copy of the guide can be found on the server in the file

/usr/local/sge/doc/SGE53AdminUserDoc.pdf