Difference between revisions of "Administration weekly meetings"

Revision as of 14:50, 19 January 2006

Previous administration meeting

Thursday 15th December

Agenda

New Wiki pages
New conferences
Christos: There is 40 EUR left from the coffee machine contribution, let's discuss how to spend this money, one idea is we buy some common lab coffee which we can all use, or even to buy a couple of cakes would not be a bad idea. Lets discuss this cause in there there's ~1.9 e from everybody!
Max: Issues with NFS on the IRIDIA Cluster.
- The solution adopted by Prasanna for his experiments relieves some "stress" from the harddrive (bottleneck for NFS).
Max: Memory for the nodes of IRIDIA Cluster.
- I have installed on r17 4*512MB and on r04 8*256MB. Some quick test seems to show some small difference in performance.
Rodi: Policy for IRIDIA Tech Report
Marco/Mauro: Backup databases
Anders: Coffee machine and decalcification?

Results

New Wiki pages:
- Updated LaTeX package for technical reports - can be downloaded from IRIDIA Technical reports.
New Conferences
- "IEEE 2006 Workshop on Distributed Intelligent Systems" added to Robotics and AI conferences, journals and impact factors
Cluster and NFS: Frequent reads/writing on the NFS on the cluster is slow. Use /tmp (local on each node) if you want to use intermediate files. Can give a x3 speed-up. A Wiki page should be made on these type of programming-for-the-cluster-tricks (Max).
Cluster and RAM: Max and Rodi are currently testing different memory configurations for the cluster.
IRIDIA technical reports and dates: The date of the technical report is the day on which you get the technical report number from Muriel. This should prevent situations where a technical report from e.g. 2004 references publications from 2005.
Database backup: The DB backup should be done using mysqldump and not by copying files. Alex is responsible.
Coffee machine: Tom is responsible for hunting down the coffee machine manual and the decalcify the machine.
New permanent point for the weekly admin meetings: Rodi suggested to add a new default item to the agendas for the admin meetings, namely "Announcements and Publications". The idea is that if anyone is about to publish something or have published something then they briefly tell everyonbe else about it. Rodi's creative suggestion was accepted without much bloodshed.

Thursday 12th January

Agenda

New Wiki pages
New conferences
Announcements and Publications
Anders: Cluster queues and nice levels on the cluster - it seems like they need to be balanced if the "long" jobs are supposed to ever finish.
Max: Cluster maintenance - I need to shut down the cluster for 1 day in order to performe some maintenance tasks (we received the new RAM and the new power cord)

Results

New Wiki pages
- IRIDIA Technical reports
New conferences: None
Announcements and Publications: Rodi, Tom, Francisco and Anders had new publications.
Anders: Cluster queues and nice levels on the cluster
- The cluster queues will be adjusted.
- Max: The queues have been reniced like this: short=nice 2, medium=nice 3, long=nice 3, par=nice 3
Max: Cluster maintenance
- Don't submit any jobs to the cluster before the maintenance has been performed. An estimate for the time need for the jobs currenty in the queue is needed by the submitters. Max will let everyone know when the cluster maintenance is going to take place once he knows.
  - Max: Status on Jan 13 - Prasanna's job should finish in 2-3 days; Tom has not communicate me still a forcast

Thursday 19th January

Agenda

New Wiki pages
New conferences
Announcements and Publications
- Max: Cluster and queueing system: In order to avoid that a node is chocked by too many concurrent jobs (this was happening when nodes were using heavily the swap space), we need to tune the number of slots per queue.

Results

New Wiki pages
- Max: Apple computer security
- Max: Programming tips for the cluster
- Max: IRIDIA Wi-Fi network
New conferences: None
Announcements and Publications
- Cluster and queueing system
  - 1GB of memory has been added to each node of the cluster. Now each node has 2GB or RAM, 4.5GB of swap space, and 20GB of /tmp space for local data storage. Nodes from r02 to r17 have 4x512MB DDR ECC REG DIMM, while nodes from r18 to r33 have 8*256MB DDR ECC REG DIMM.
  - The queues have been reniced like this: short=nice 2, medium=nice 3, long=nice 3, par=nice 2
  - The slots for queues will be changed like this: 3 slots in the short queue, 2 slots in the medium queue, 1 slot in the long queue and 1 slot in the parallel queue. We will have maximum 7 concurrent jobs per node (with an average space in RAM of 290MB per process) and 224 maximal concurrent jobs in the whole system.

Thursday 26th January

Agenda

New Wiki pages
New conferences
Announcements and Publications

Thursday 2nd February

Agenda

New Wiki pages
New conferences
Announcements and Publications

Thursday 9th February

Agenda

New Wiki pages
New conferences
Announcements and Publications

Thursday 16th February

Agenda

New Wiki pages
New conferences
Announcements and Publications

Thursday 23rd February

Agenda

New Wiki pages
New conferences
Announcements and Publications

Thursday 2nd March

Agenda

New Wiki pages
New conferences
Announcements and Publications

Thursday 9th March

Agenda

New Wiki pages
New conferences
Announcements and Publications

Thursday 16th March

Agenda

New Wiki pages
New conferences
Announcements and Publications

@@ Line 60: / Line 60: @@
 * New conferences
 * Announcements and Publications
+** '''Max:''' Cluster and queueing system: In order to avoid that a node is chocked by too many concurrent jobs (this was happening when nodes were using heavily the swap space), we need to tune the number of slots per queue.
-** '''Max:''' Cluster and queueing system
-*** In order to avoid that a node is chocked by too many concurrent jobs (this was happening when nodes were using heavily the swap space), we need to tune the number of slots per queue. The actual situation per node is the following: 5 slots in the short queue, 3 slots in the medium queue, 1 slot in the long queue and 3 slots in the parallel queue. I propose to reduce the number of concurrently running jobs to this: 3 slots in the short queue, 2 slots in the medium queue, 1 slot in the long queue and 1 slot in the parallel queue. This would bring from 12 to 7 the number concurrent jobs per node (with an average space in RAM of 290MB per process) and from 384 to 224 the maximal number of concurrent jobs in the whole system (in the last year the max number of concurrent jobs was around 220, while the average was much lower, in the range of 100 jobs).
 '''''Results'''''

Difference between revisions of "Administration weekly meetings"

Revision as of 14:50, 19 January 2006

Contents

Thursday 15th December

Thursday 12th January

Thursday 19th January

Thursday 26th January

Thursday 2nd February

Thursday 9th February

Thursday 16th February

Thursday 23rd February

Thursday 2nd March

Thursday 9th March

Thursday 16th March

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Tools

Search