Using the IRIDIA Cluster

From IridiaWiki
(Difference between revisions)
Jump to: navigation, search
(Cluster composition)
(Cluster composition)
Line 1: Line 1:
 
== Cluster composition ==
 
== Cluster composition ==
   
Currently the IRIDIA cluster is composed by 1 server (majorana) and 32+16 rack units (computational nodes). Two racks contain the nodes. The older rack contains 32 units (from c0-0 to c0-31), each one featuring 2 CPUs AMD Opteron244 working at 1,75GHz and 2GB of RAM (nodes from c0-0 to c0-15) have 4 modules of 512MB each 400MHz DDR ECC REG DIMM, nodes from c0-16 to c0-31 have 8 modules of 256MB each, 400MHz DDR ECC REG DIMM). The newer rack contains 16 units (from c1-0 to c1-15), each one featuring 2 Dual-Core AMD Opteron Processors 2216 HE working at2,4GHz and 4GB of RAM. In total the cluster is composed of 128 CPUs dedicated to computations and 2 CPUs for administrative purposes.
+
Currently the IRIDIA cluster is composed by 1 server (majorana) and 32+16 rack units (computational nodes). Two racks contain the nodes. The older rack contains 32 units (from c0-0 to c0-31), each one featuring 2 CPUs AMD Opteron244 working at 1,75GHz and 2GB of RAM (nodes from c0-0 to c0-15) have 4 modules of 512MB each 400MHz DDR ECC REG DIMM, nodes from c0-16 to c0-31 have 8 modules of 256MB each, 400MHz DDR ECC REG DIMM). The newer rack contains 16 units (from c1-0 to c1-15), each one featuring 2 Dual-Core AMD Opteron Processors 2216 HE working at 2,4GHz and 4GB of RAM. In total the cluster is composed of 128 CPUs dedicated to computations and 2 CPUs for administrative purposes.
   
 
 

Revision as of 11:29, 9 March 2007

Contents

Cluster composition

Currently the IRIDIA cluster is composed by 1 server (majorana) and 32+16 rack units (computational nodes). Two racks contain the nodes. The older rack contains 32 units (from c0-0 to c0-31), each one featuring 2 CPUs AMD Opteron244 working at 1,75GHz and 2GB of RAM (nodes from c0-0 to c0-15) have 4 modules of 512MB each 400MHz DDR ECC REG DIMM, nodes from c0-16 to c0-31 have 8 modules of 256MB each, 400MHz DDR ECC REG DIMM). The newer rack contains 16 units (from c1-0 to c1-15), each one featuring 2 Dual-Core AMD Opteron Processors 2216 HE working at 2,4GHz and 4GB of RAM. In total the cluster is composed of 128 CPUs dedicated to computations and 2 CPUs for administrative purposes.


COMPLEX_NAME: opteron244

- AMD Opteron244 (2 CPU @ 1,75GHz)

nodes: c0-0, c0-1, c0-2, c0-3, c0-4, c0-5, c0-6, c0-7, c0-8, c0-9, c0-10, c0-11, c0-12, c0-13, c0-14, c0-15, c0-16, c0-17, c0-18, c0-19, c0-20, c0-21, c0-22, c0-23, c0-24, c0-25, c0-26, c0-27, c0-28, c0-29, c0-30, c0-31


COMPLEX_NAME: opteron2216

- Dual-Core AMD Opteron2216 HE (2 CPU @ 2,4GHz)

nodes: c1-0, c1-1, c1-2, c1-3, c1-4, c1-5, c1-6, c1-7, c1-8, c1-9, c1-10, c1-11, c1-12, c1-13, c1-14, c1-15

Queues

Each computational node has the following 4 queues:


  • <machine>.short: max 2 jobs can run in the queue concurrently at nice-level 2. Each job can only run for maximum 24h of CPU time (real execution of the program, without counting the time needed by the system for multitasking, etc). If a job still runs after the 24th hour, it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it.
  • <machine>.medium: max 2 jobs can run in the queue concurrently at nice-level 3 (lower priority than the short ones). Each job can only run for maximum 72h of CPU time (real execution of the program, without counting the time needed by the system for multitasking, etc). If a job still runs after the 72nd hour, it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it.
  • <machine>.long: only 1 job at a time can run in this queue at nice-level 3 (lower priority than the short ones). The job can only run for maximum 168h of CPU time (real execution of the program, without counting the time needed by the system for multitasking, etc). If a job still runs after the 168th hour, it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it.
  • <machine>.par: in this queue only parallel jobs are accepted. 1 job at a time can run in this queue at nice-level 3 (lower priority than the short ones).

Summarizing: on each node can run concurrently up to 6 jobs (distributed on 2 CPUs) with an average space in RAM of 341MB per job. The queueing system can run max 192 concurrent jobs on the whole cluster.


YOU HAVE TO DESIGN YOUR COMPUTATIONS IN SUCH A WAY THAT EACH SINGLE JOB DOESN'T RUN FOR MORE THAN 7 DAYS (of CPU time).


THE SCHEDULER CANNOT PUT IN EXECUTION MORE THAN 64 JOBS OF THE SAME USER AT THE SAME TIME. IF YOU SUBMIT MORE THAN 64 JOBS, MAXIMUM 64 WILL BE RUNNING AT THE SAME TIME.

How to submit a job

To submit a job that lasts up to 1 day you have to specify -l shorttime in the shell script passed at the qsub command, like in this example:

#!/bin/bash
#$ -N test_short
#$ -l opteron244
#$ -l shorttime
#$ -cwd


To submit a job that lasts up to 3 days you have to specify -l mediumtime in the shell script passed at the qsub command, like in this example:

#!/bin/bash
#$ -N test_medium
#$ -l opteron244
#$ -l mediumtime
#$ -cwd


To submit a job that lasts up to 7 days you have to specify -l longtime in the shell script passed at the qsub command, like in this example:

#!/bin/bash
#$ -N test_long
#$ -l opteron244
#$ -l longtime
#$ -cwd


To submit a job that runs in the parallel environment you have to specify -l parallel -pe PARALLEL_ENV NUM_PROCESS in the shell script passed at the qsub command, like in this example:

#!/bin/bash
#$ -N test_parallel
#$ -l opteron244
#$ -l parallel
#$ -pe pvm 10
#$ -cwd

Submission tips for the cluster

If your job lasts less than 1 day it doesn't matter in which queue it will end up because no time constraint will be violated. In this case you might want that it gets the first queue available, no matter which. To do so, simply remove the -l queue_name from your script.


If your job lasts less than 1 day and you want it to run in the short time queue or in the medium time queue, no matter which of the two, write -l shortmedium as queue type.


If your job lasts less than 3 days and you want it to run in the medium time queue or in the long time queue, no matter which of the two, write -l mediumlong as queue type.

Programming tips for the cluster

If the jobs needs to read/write quite much and often, it is better to copy the input files to the /tmp directory (which is in the local harddrive of the node) and to write the output files also there, moving them in the /home/user_name directory only when the computation is over. In this way your job does not have to use NFS for each read/write operation relieving majorana of some weight (the /home partition is exported from there to all the nodes), making it more fast (Prasanna measured a speedup of 2-3x on his code).

Personal tools