Difference between revisions of "Administration weekly meetings"
From IridiaWiki
Jump to navigationJump to searchLine 60: | Line 60: | ||
* New conferences |
* New conferences |
||
* Announcements and Publications |
* Announcements and Publications |
||
+ | ** '''Max:''' Cluster and queueing system: In order to avoid that a node is chocked by too many concurrent jobs (this was happening when nodes were using heavily the swap space), we need to tune the number of slots per queue. |
||
− | ** '''Max:''' Cluster and queueing system |
||
− | *** In order to avoid that a node is chocked by too many concurrent jobs (this was happening when nodes were using heavily the swap space), we need to tune the number of slots per queue. The actual situation per node is the following: 5 slots in the short queue, 3 slots in the medium queue, 1 slot in the long queue and 3 slots in the parallel queue. I propose to reduce the number of concurrently running jobs to this: 3 slots in the short queue, 2 slots in the medium queue, 1 slot in the long queue and 1 slot in the parallel queue. This would bring from 12 to 7 the number concurrent jobs per node (with an average space in RAM of 290MB per process) and from 384 to 224 the maximal number of concurrent jobs in the whole system (in the last year the max number of concurrent jobs was around 220, while the average was much lower, in the range of 100 jobs). |
||
'''''Results''''' |
'''''Results''''' |
Revision as of 13:50, 19 January 2006
Previous administration meeting
Thursday 15th December
Agenda
- New Wiki pages
- New conferences
- Christos: There is 40 EUR left from the coffee machine contribution, let's discuss how to spend this money, one idea is we buy some common lab coffee which we can all use, or even to buy a couple of cakes would not be a bad idea. Lets discuss this cause in there there's ~1.9 e from everybody!
- Max: Issues with NFS on the IRIDIA Cluster.
- The solution adopted by Prasanna for his experiments relieves some "stress" from the harddrive (bottleneck for NFS).
- Max: Memory for the nodes of IRIDIA Cluster.
- I have installed on r17 4*512MB and on r04 8*256MB. Some quick test seems to show some small difference in performance.
- Rodi: Policy for IRIDIA Tech Report
- Marco/Mauro: Backup databases
- Anders: Coffee machine and decalcification?
Results
- New Wiki pages:
- Updated LaTeX package for technical reports - can be downloaded from IRIDIA Technical reports.
- New Conferences
- "IEEE 2006 Workshop on Distributed Intelligent Systems" added to Robotics and AI conferences, journals and impact factors
- Cluster and NFS: Frequent reads/writing on the NFS on the cluster is slow. Use /tmp (local on each node) if you want to use intermediate files. Can give a x3 speed-up. A Wiki page should be made on these type of programming-for-the-cluster-tricks (Max).
- Cluster and RAM: Max and Rodi are currently testing different memory configurations for the cluster.
- IRIDIA technical reports and dates: The date of the technical report is the day on which you get the technical report number from Muriel. This should prevent situations where a technical report from e.g. 2004 references publications from 2005.
- Database backup: The DB backup should be done using mysqldump and not by copying files. Alex is responsible.
- Coffee machine: Tom is responsible for hunting down the coffee machine manual and the decalcify the machine.
- New permanent point for the weekly admin meetings: Rodi suggested to add a new default item to the agendas for the admin meetings, namely "Announcements and Publications". The idea is that if anyone is about to publish something or have published something then they briefly tell everyonbe else about it. Rodi's creative suggestion was accepted without much bloodshed.
Thursday 12th January
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
- Anders: Cluster queues and nice levels on the cluster - it seems like they need to be balanced if the "long" jobs are supposed to ever finish.
- Max: Cluster maintenance - I need to shut down the cluster for 1 day in order to performe some maintenance tasks (we received the new RAM and the new power cord)
Results
- New Wiki pages
- New conferences: None
- Announcements and Publications: Rodi, Tom, Francisco and Anders had new publications.
- Anders: Cluster queues and nice levels on the cluster
- The cluster queues will be adjusted.
- Max: The queues have been reniced like this: short=nice 2, medium=nice 3, long=nice 3, par=nice 3
- Max: Cluster maintenance
- Don't submit any jobs to the cluster before the maintenance has been performed. An estimate for the time need for the jobs currenty in the queue is needed by the submitters. Max will let everyone know when the cluster maintenance is going to take place once he knows.
- Max: Status on Jan 13 - Prasanna's job should finish in 2-3 days; Tom has not communicate me still a forcast
- Don't submit any jobs to the cluster before the maintenance has been performed. An estimate for the time need for the jobs currenty in the queue is needed by the submitters. Max will let everyone know when the cluster maintenance is going to take place once he knows.
Thursday 19th January
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
- Max: Cluster and queueing system: In order to avoid that a node is chocked by too many concurrent jobs (this was happening when nodes were using heavily the swap space), we need to tune the number of slots per queue.
Results
- New Wiki pages
- New conferences: None
- Announcements and Publications
- Cluster and queueing system
- 1GB of memory has been added to each node of the cluster. Now each node has 2GB or RAM, 4.5GB of swap space, and 20GB of /tmp space for local data storage. Nodes from r02 to r17 have 4x512MB DDR ECC REG DIMM, while nodes from r18 to r33 have 8*256MB DDR ECC REG DIMM.
- The queues have been reniced like this: short=nice 2, medium=nice 3, long=nice 3, par=nice 2
- The slots for queues will be changed like this: 3 slots in the short queue, 2 slots in the medium queue, 1 slot in the long queue and 1 slot in the parallel queue. We will have maximum 7 concurrent jobs per node (with an average space in RAM of 290MB per process) and 224 maximal concurrent jobs in the whole system.
- Cluster and queueing system
Thursday 26th January
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
Thursday 2nd February
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
Thursday 9th February
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
Thursday 16th February
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
Thursday 23rd February
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
Thursday 2nd March
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
Thursday 9th March
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications
Thursday 16th March
Agenda
- New Wiki pages
- New conferences
- Announcements and Publications