Difference between revisions of "IRIDIA cluster architecture"
Line 55: | Line 55: | ||
*23: connects 7 diskless nodes (the left part of the shelves): p02, p03, p04, p05, p06, p07 and p08 |
*23: connects 7 diskless nodes (the left part of the shelves): p02, p03, p04, p05, p06, p07 and p08 |
||
*24: connects 1 switch D-Link DGS-1016T (40W), 1 switch D-Link DGS-1024T (75W) and 8 diskless nodes (the righ part of the shelves): p15, p17, p18, p19, p20, p21, p22, and p23 |
*24: connects 1 switch D-Link DGS-1016T (40W), 1 switch D-Link DGS-1024T (75W) and 8 diskless nodes (the righ part of the shelves): p15, p17, p18, p19, p20, p21, p22, and p23 |
||
− | *25: connects 9 nodes of the rack, |
+ | *25: connects 9 nodes of the rack, 1 switch Netgear GS-524T (70W), the rack UPS (54W) and the server majorana |
*21: [[Image:NewLine21.jpg|Line 21|Power Line 21]] |
*21: [[Image:NewLine21.jpg|Line 21|Power Line 21]] |
Revision as of 11:34, 1 July 2005
Introduction
The cluster was built in 2002 and has been extended and modified since. Currently it consists of two servers and a number of nodes with disks and a number with out disks. The disk-less nodes are the normal PC looking boxes on the shelves in the server room, while the nodes with disks are the ones in the rack.
Currently, servers as well as nodes run a 32-bit Debian GNU/Linux, however the nodes in the rack are dual Opterons so that might change at some point in the future.
majorana is the main server and provides the following services:
- NTP
- NIS
- Sun Grid Engine (the actual scheduler is running on polyphemus)
- Vortex License
- DHCP for the computer on the shelves
- TFTP for diskless booting
Normally, users will log on to polyphemus and submit jobs using the Sun Grid Engine.
Physical setup
Redundancy and replication
Notice that only some none of the things mentioned below has actually been installed - this should be considered merely a wish list or a list of ideas
Two servers provide two access points to the cluster. The services required by the nodes are splitted on the servers in order to reduce the workload and to improve robustness to failures. Here there is a description of which services can be duplicated and what needs to be done in case of crashes.
- The NIS protocol already includes the presence of more that one server, of which only one is the master server. The others are slave servers that are a copy of the master and that work only when the master is unreachable.
- NTP is used to keep the clocks of the cluster syncronized. The clients can access only one server (to check!), although there might be more in the network. A failure in the NTP server is not considered critical, because it will take days before the clocks of the clients differ unreasonably. Therefore, only one server is enough
- The Vortex License server cannot be copied, and it is already configured so that it can run only on polyphemus. If polyphemus crashes, it is still possible to start it on majorana by changing the MAC address of the latter to 00:0C:6E:02:41:C3 (polyphemus's MAC address). majorana can copy the file needed to run the server on a daily basis.
- Sun Grid Engine (SGE) can be run only on one computer (polyphemus). All the nodes access its data via NFS. majorana can copy SGE directory daily, but if polyphemus crashes, all the nodes must be instructed to mount the new directory on majorana.
- /home directories. There can be only one NFS server in the network. majorana was chosen because it react faster: it has 2 CPUs, and when one is busy writing, the other can still process other incoming requests. Both majorana and polyphemus use RAID architecture to prevent data loss. The only problem is if majorana is not reachable any more. In this case, each process on the nodes that tries to access \texttt{/home} will be stopped till majorana comes up again.
- The root directories of the diskless nodes are on majorana. If majorana is not reachable, these nodes will be blocked waiting for majorana to come up again. polyphemus could keep a backup the these directories, but if the administrator wants to mount the backup directories on polyphemus, the nodes must be manually rebooted because they are note reachable via SSH. The DHCP configuration must also be changed to give the new mount point of the backup directories to the nodes.
- majorana is also a DHCP server for all the local network. Two groups are defined in its configuration file, one for the diskless and one for the computers in the rack.
Electric system and power plugs
Currently the cluster is connected to the power system through 5 lines: 21, 22, 23, 24, and 25.
- 21: connects the servers iridia, polyphemus, 2 UPS, 1 monitor and 1 switch D-Link DGS-1004T (8W)
- 22: connects the 6 nodes at the top of the rack
- 23: connects 7 diskless nodes (the left part of the shelves): p02, p03, p04, p05, p06, p07 and p08
- 24: connects 1 switch D-Link DGS-1016T (40W), 1 switch D-Link DGS-1024T (75W) and 8 diskless nodes (the righ part of the shelves): p15, p17, p18, p19, p20, p21, p22, and p23
- 25: connects 9 nodes of the rack, 1 switch Netgear GS-524T (70W), the rack UPS (54W) and the server majorana