IRIDIA cluster the boot process

From IridiaWiki
Jump to: navigation, search

The task of maintenance of the cluster is much easier if one knows the boot sequences of the clients. In fact, if the sequences are known, it is easier to find the source of errors and which files should be modified.

There are two kind of clients in the cluster, disk-less (the shelf-cluster) and with-disk (rack cluster). The boot sequence is obviously different for both, and therefore are explained separately.

Contents

Diskless boot process

Step 1 
The clients is switched on. Its ethernet card sends a DHCP request in broadcast on the network. The program in the card's firmware that deals with this is called PXE (PXE is actually one of the standards that can be used to boot, and is developed mainly by Intel. 3com cards, for instance, require different procedures that the one described here in order to receive a kernel to boot.)
 The BIOS of the client must be configured in order to enable PXE. Moreover, the card should be set as first booting device. 
Step 2 
majorana receives the requests and start a DHCP dialog with the client. During the dialog, the server tells the client its IP address, its name, the default gateways, name and time servers. Most importantly, it tells to which computer the client should address to receive the kernel (majorana again), which is the file to request (pxelinux.0) and where to find the root image to mount (on majorana).
 On majorana, /etc/dhcp.conf or /etc/dhcp3/dhcp.conf. pxelinux.0 is part of the syslinux package.
Step 3 
The client start a TFTP (Trivial FTP) connection to the server it was told.
Step 4 
The server receives the TFTP request, starts a TFTP server, and sends pxelinux.0 in /var/lib/tftpboot/ to the client. The TFTP repository /var/lib/tftpboot/ is specified as command line to the TFTP server and is specified in /etc/inetd.conf.
 The Debian way to modify this file is by using update-inetd.
Step 5 
The client receives and executes pxelinux.0, which IS NOT the kernel! It is just a boot loader, like LILO or GRUB, with the difference that it works via network. The client asks the server, via TFTP, the boot configuration file (something like GRUB's menu.lst or LILO's lilo.conf). The file is exepcted to be in a subdirectory called pxelinux.cfg. pxlinux.0 then tries several different filenames, till one of them is found on the server and retrieved. The first one is equal to the IP address of the client converted in hexadecimal (if the IP is 192.168.100.2, then the file name is C0A86402). If it is not found, it continues by taking away the last letter of the name, till a matching name is found (C0A8640, C0A864, etc.). When also the last one fails, it tries with the name default, which actually is the only one present on the server. In the tftpdirectory there is one file per diskless node.
/var/lib/tftpboot/pxelinux.cfg/*
Step 6 
The clients receives and reads the configuration file, which specifies the name of the kernel to download and its parameters. The client then execute the last TFTP tranfer to download the kernel.
The name specified in the configuration file must be the name of a file in /var/lib/tftpboot.
Step 7 
The PXE loads the kernel into memory and executes it. During the boot the kernel re-start the DHCP dialogue to step 1. Then it mounts the NFS directory specified during the dialogue, called nfsroot, on /.

The nfsroot is on majorana, in /var/lib/diskless/default/root/. The list of exported directories is in /etc/exports.

After each change to this file, the NFS server should be restarted with /etc/init.d/nfs-kernel-server restart.
Step 8 
The nfsroot contains those files and applications common to all disk-less client. However, each client needs to have some specific and reserved areas for its programs. One example is the /var directory, which contains, among the others, the log files. It is important, in order to fix any problem, that each host has its own log, and therefore the /var directories should be separated. The same applies for /dev, /etc, /tmp. The private directories are on majorana. During the boot, the client mounts its own private directories from the server (The server uses the package diskless, which decided for this division and structure, to manage and maintain all the directories.)

/var/lib/diskless/default/<CLIENT_IP>/dev, /var/lib/diskless/default/<CLIENT_IP>/etc, /var/lib/diskless/default/<CLIENT_IP>/tmp and /var/lib/diskless/default/<CLIENT_IP>/var.

Step 9 
The client proceeds with the normal linux boot, activating all the services specified in the default run-level (2).

/var/lib/diskless/default/<CLIENT_IP>/etc/inittab and

/var/lib/diskless/default/<CLIENT_IP>/etc/rc2.d/.

Rack Boot Sequence

The boot sequence of the computers in the rack, those with disks, is much simpler, but there are of two different sequences! The server that takes care of these computer is \maj, and it uses FAI (Fully Automated Installation) to manage the clients. A client can boot either to start a new installation or for normal use. The command fai-chboot on majorana is used to choose which boot should be performed. For instance, fai-chboot -IBv r02 sets the installation boot sequence for r02, and fai-chboot -r r02 sets the normal one (RTFM! man fai-chboot).

The first steps are the same for both sequences, and are basically those for the disk-less boot:

Step 1 
The clients is switched on. Its ethernet card sends a DHCP request in broadcast on the network. The program in the card's firmware that deals with this is called PXE (PXE is actually one of the standards that can be used to boot, and is developed mainly by Intel. 3com cards, for instance, require different procedures that the one described here in order to receive a kernel to boot.)
 The BIOS of the client must be configured in order to enable PXE. Moreover, the card should be set as first booting device. 
Step 2 
majorana receives the requests and start a DHCP dialog with the client. During the dialog, the server tells the client its IP address, its name, the default gateways, name and time servers. Most importantly, it tells to which computer the client should address to receive the kernel (majorana again), which is the file to request (pxelinux.0) and where to find the root image to mount (on majorana).
 On majorana, /etc/dhcp.conf or /etc/dhcp3/dhcp.conf. pxelinux.0 is part of the syslinux package.
Step 3 
The client start a TFTP (Trivial FTP) connection to the server it was told.
Step 4 
The server receives the TFTP request, starts a TFTP server, and sends pxelinux.0 in /var/lib/tftpboot/ to the client. The TFTP repository /var/lib/tftpboot/ is specified as command line to the TFTP server and is specified in /etc/inetd.conf.
 The Debian way to modify this file is by using update-inetd.
Step 5 
The client receives and executes pxelinux.0, which IS NOT the kernel! It is just a boot loader, like LILO or GRUB, with the difference that it works via network. The client asks the server, via TFTP, the boot configuration file (something like GRUB's menu.lst or LILO's lilo.conf). The file is exepcted to be in a subdirectory called pxelinux.cfg. pxlinux.0 then tries several different filenames, till one of them is found on the server and retrieved. The first one is equal to the IP address of the client converted in hexadecimal (if the IP is 192.168.100.2, then the file name is C0A86402). If it is not found, it continues by taking away the last letter of the name, till a matching name is found (C0A8640, C0A864, etc.). When also the last one fails, it tries with the name default, which actually is the only one present on the server.
/var/lib/tftpboot/pxelinux.cfg/default

Normal boot sequence

For normal boot, the search for the PXE configuration fails till default, which instructs the client to boot from the local disk. Normal linux boot sequence takes place. It is possible to use fai-chboot on majorana to create a PXE configuarion file that instructs the client to perform a local boot

fai-chboot -o <node_name>

Installation boot sequence

In case of installation boot, there is a file in /boot/fai/pxelinux.cfg that corresponds to the IP address of the client. In the file there is the name of a valid kernel in /boot/fai, called installation kernel. FAI can be instructed to use whatever kernel for the installation. (/boot/fai is actually a link to /var/lib/tftpboot)

See all the files in /etc/fai on majorana, and the man pages of fai-setup

It is out of the scope of this document to explain how to configure and use FAI. We refer the reader to the documentation that comes with the FAI package that is in /usr/share/doc/fai/ on majorana (RTFM!).

Then a disk-less-like boot process takes places:

Step 6 
The clients receives and reads the configuration file, which specifies the name of the kernel to download and its parameters. The client then execute the last TFTP tranfer to download the kernel.
The name specified in the configuration file must be the name of a file in /tftboot.
Step 7 
The PXE loads the kernel into memory and executes it. During the boot the kernel re-start the DHCP dialogue to step 1. Then it mounts the NFS directory specified during the dialogue, called nfsroot, on /.

The nfsroot is on majorana, in /var/lib/diskless/default/root/. The list of exported directories is in /etc/exports.

After each change to this file, the NFS server should be restarted with /etc/init.d/nfs-kernel-server restart.
Step 8 
The client executes the installation script that comes with FAI. The first step is to mount the configuration directory in /fai on the client. The content of this directory is extensively explained in the FAI documentation.

The configuration directory is on majorana in /usr/local/share/fai.

Step 9 
The client is installed and configured according the instructions in the configuration files. To speed up installation, the clients retrieve the packages from polyphemus, which retrieves the packages from official repositories and caches them on its disk.

The cache is in /mnt/disk1/apt-cacher, and is maintained by the program apt-cacher. Configuration files for it are in /etc/apt-cacher/ on polyphemus. The clients uses the repository list specified in /etc/fai/sources.list (on majorana) during the installation and a copy of the one under /usr/local/share/fai/ (on majorana) afterwards.

Step 10 
When the installation is over, the client executes fai-chboot on the server in order to select a normal boot for itself. It then copies the log files of the installation on the server and reboots.

The log files are in /mnt/disk1/fai/ on polyphemus. The latest logs are always in /mnt/disk1/fai/<CLIENT_NAME>/last-install.

Step 11 
The normal boot sequence takes place.
Personal tools