Difference between revisions of "IRIDIA cluster todo"
From IridiaWiki
Jump to navigationJump to searchLine 8: | Line 8: | ||
* A daemon that checks the status of the UPS should be installed on both majorana and polyphemus. |
* A daemon that checks the status of the UPS should be installed on both majorana and polyphemus. |
||
* Make the configuration of the package on the diskless and on the rack more similar. At the moment FAI takes care only of modifing the important configuation files in /etc. |
* Make the configuration of the package on the diskless and on the rack more similar. At the moment FAI takes care only of modifing the important configuation files in /etc. |
||
− | * Use one repository for the configuation of those packaged which use debconf. |
+ | * Use one repository for the configuation of those packaged which use debconf. This program can be used to access configuration DBs also shared via NFS of querying a LDAP server. |
− | * Create a script to |
+ | * Create a script to automatically install/upgrade packages on the clients |
* Set up the backup server to automatically backup configuration files on the cluster. |
* Set up the backup server to automatically backup configuration files on the cluster. |
||
+ | * Move the SGE scheduler to majorana, and configure polyphemus in order to be only a submission host. |
||
+ | * Add a DNS server that caches queries from the local network, so to reduce load (and possible problems) on the ufficial DNS. |
||
+ | * modify update-cluster scripts on majorana, so to create different dsh groups: athlon*, opteron*, diskless, rack, etc. This can be done coding the information in a special way in the notes of each node. |
||
Wishlist: |
Wishlist: |
||
* Install Ganglia to monitor the usage of the cluster via web. |
* Install Ganglia to monitor the usage of the cluster via web. |
||
− | * Install LDAP instead of NIS (only if it is better). |
+ | * Install LDAP instead of NIS (only if it is better or it works). |
* Install a new version of Sun Grid Engine (or something else). |
* Install a new version of Sun Grid Engine (or something else). |
||
* Install Bugzilla to trace problems on the cluster (and to have a knowledge base of how to solve them!). |
* Install Bugzilla to trace problems on the cluster (and to have a knowledge base of how to solve them!). |
Revision as of 12:52, 22 February 2005
This page contains a list of items which still need to be done on the cluster.
Errors & problems:
- There is a random error when installing the clients in the rack with FAI. The clients start to output a lot of things on screen, but unfortunately they scroll to fast to be read. I could not find any way to block them.
- Neither yppasswd nor passwd work on the clients of the NIS domain. User have to change password from majorana.
Improvements:
- A daemon that checks the status of the UPS should be installed on both majorana and polyphemus.
- Make the configuration of the package on the diskless and on the rack more similar. At the moment FAI takes care only of modifing the important configuation files in /etc.
- Use one repository for the configuation of those packaged which use debconf. This program can be used to access configuration DBs also shared via NFS of querying a LDAP server.
- Create a script to automatically install/upgrade packages on the clients
- Set up the backup server to automatically backup configuration files on the cluster.
- Move the SGE scheduler to majorana, and configure polyphemus in order to be only a submission host.
- Add a DNS server that caches queries from the local network, so to reduce load (and possible problems) on the ufficial DNS.
- modify update-cluster scripts on majorana, so to create different dsh groups: athlon*, opteron*, diskless, rack, etc. This can be done coding the information in a special way in the notes of each node.
Wishlist:
- Install Ganglia to monitor the usage of the cluster via web.
- Install LDAP instead of NIS (only if it is better or it works).
- Install a new version of Sun Grid Engine (or something else).
- Install Bugzilla to trace problems on the cluster (and to have a knowledge base of how to solve them!).