Difference between revisions of "IRIDIA cluster todo"

From IridiaWiki
Jump to navigationJump to search
m
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
This page contains a list of items which still need to be done on the cluster.
 
This page contains a list of items which still need to be done on the cluster.
   
  +
Errors & problems:
 
* There is a random error when installing the clients in the rack with FAI. The clients start to output a lot of things on screen, but unfortunately they scroll to fast to be read. I could not find any way to block them.
 
* There is a random error when installing the clients in the rack with FAI. The clients start to output a lot of things on screen, but unfortunately they scroll to fast to be read. I could not find any way to block them.
  +
* Neither <tt>yppasswd</tt> nor <tt>passwd</tt> work on the clients of the NIS domain. User have to change password from <tt>majorana</tt>.
  +
* '''Max''': in order to have LAM/MPI works we need to set on each node update-alternatives --config rsh to the option 2 (ssh) (TODO: add this command in the FAI configuration files of the node image)
  +
  +
Improvements:
 
* A daemon that checks the status of the UPS should be installed on both majorana and polyphemus.
 
* A daemon that checks the status of the UPS should be installed on both majorana and polyphemus.
* Install Ganglia to monitor the usage of the cluster via web.
 
 
* Make the configuration of the package on the diskless and on the rack more similar. At the moment FAI takes care only of modifing the important configuation files in /etc.
 
* Make the configuration of the package on the diskless and on the rack more similar. At the moment FAI takes care only of modifing the important configuation files in /etc.
* Use one repository for the configuation of those packaged which use debconf. I found out that this program can be used to access configuration DBs also shared via NFS of querying a LDAP server.
+
* Use one repository for the configuation of those packaged which use debconf. This program can be used to access configuration DBs also shared via NFS of querying a LDAP server.
  +
* Create a script to automatically install/upgrade packages on the clients
* Install LDAP instead of NIS (only if it is better).
 
  +
* Set up the backup server to automatically backup configuration files on the cluster.
  +
* Move the SGE scheduler to majorana, and configure polyphemus in order to be only a submission host.
  +
* Add a DNS server that caches queries from the local network, so to reduce load (and possible problems) on the ufficial DNS.
  +
* modify update-cluster scripts on majorana, so to create different dsh groups: athlon*, opteron*, diskless, rack, etc. This can be done coding the information in a special way in the notes of each node.
  +
  +
Wishlist:
 
* Install Ganglia to monitor the usage of the cluster via web.
 
* Install LDAP instead of NIS (only if it is better or it works).
 
* Install a new version of Sun Grid Engine (or something else).
 
* Install a new version of Sun Grid Engine (or something else).
  +
* Install Bugzilla to trace problems on the cluster (and to have a knowledge base of how to solve them!).

Latest revision as of 17:36, 13 February 2006

This page contains a list of items which still need to be done on the cluster.

Errors & problems:

  • There is a random error when installing the clients in the rack with FAI. The clients start to output a lot of things on screen, but unfortunately they scroll to fast to be read. I could not find any way to block them.
  • Neither yppasswd nor passwd work on the clients of the NIS domain. User have to change password from majorana.
  • Max: in order to have LAM/MPI works we need to set on each node update-alternatives --config rsh to the option 2 (ssh) (TODO: add this command in the FAI configuration files of the node image)

Improvements:

  • A daemon that checks the status of the UPS should be installed on both majorana and polyphemus.
  • Make the configuration of the package on the diskless and on the rack more similar. At the moment FAI takes care only of modifing the important configuation files in /etc.
  • Use one repository for the configuation of those packaged which use debconf. This program can be used to access configuration DBs also shared via NFS of querying a LDAP server.
  • Create a script to automatically install/upgrade packages on the clients
  • Set up the backup server to automatically backup configuration files on the cluster.
  • Move the SGE scheduler to majorana, and configure polyphemus in order to be only a submission host.
  • Add a DNS server that caches queries from the local network, so to reduce load (and possible problems) on the ufficial DNS.
  • modify update-cluster scripts on majorana, so to create different dsh groups: athlon*, opteron*, diskless, rack, etc. This can be done coding the information in a special way in the notes of each node.

Wishlist:

  • Install Ganglia to monitor the usage of the cluster via web.
  • Install LDAP instead of NIS (only if it is better or it works).
  • Install a new version of Sun Grid Engine (or something else).
  • Install Bugzilla to trace problems on the cluster (and to have a knowledge base of how to solve them!).