Difference between revisions of "Using the IRIDIA Cluster"

From IridiaWiki
Jump to navigationJump to search
 
(112 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
See [[http://majorana.ulb.ac.be/wordpress/ http://majorana.ulb.ac.be/wordpress/]]
== Cluster composition ==
 
 
Currently the IRIDIA cluster is composed by two kind of computational nodes: the diskless PCs (14 nodes) and the rack units (32 nodes). Each diskless PC has 1 CPU Athlon, while each rack unit has 2 CPUs Opteron.
 
 
 
COMPLEX_NAME: '''athlon1400'''
 
 
- AMD Athlon 1400 (1 CPU @ 1,36GHz)
 
 
p15
 
 
 
COMPLEX_NAME: '''athlon1800'''
 
 
- AMD Athlon 1800+ (1 CPU @ 1,46GHz)
 
 
p06
 
 
 
COMPLEX_NAME: '''athlon2200'''
 
 
- AMD Athlon 2200+ (1 CPU @ 1,75GHz)
 
 
p07
 
 
 
COMPLEX_NAME: '''athlon2400'''
 
 
- AMD Athlon 2400+ (1 CPU @ 1,95GHz)
 
 
p02, p03, p04, p05, p08
 
 
 
COMPLEX_NAME: '''athlon2800'''
 
 
- AMD Athlon 2800+ (1 CPU @ 2,03GHz)
 
 
p17, p18, p19, p20, p21, p22
 
 
 
COMPLEX_NAME: '''athlon'''
 
 
- All the AMD Athlon processors of the cluster
 
 
p02, p03, p04, p05, p06, p07, p08, p15, p17, p18, p19, p20, p21, p22
 
 
 
COMPLEX_NAME: '''opteron244'''
 
 
- AMD Opteron244 (2 CPU @ 1,75GHz)
 
 
r02, r03, r04, r05, r06, r07, r08, r09, r10, r11, r12, r13, r14, r15, r16, r17, r18, r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30, r31, r32, r33
 
 
== Queues ==
 
 
Each computational node has the following 3 queues:
 
 
 
*'''<machine>.short''': max 6 jobs concurrently can run in this queue and they run at nice-level 2. Each job can only run for '''maximum 24h of CPU time''' (so real execution of the program, without the time needed by the system for multitasking, etc) and '''cannot use more than 512MB of memory'''. If a job reaches the 24th hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it. If a job try to use more than 512MB of memory a SIGKILL will terminate it.
 
 
*'''<machine>.medium''': max 4 jobs concurrently can run in this queue, but they run at nice-level 5 (lower priority than the short ones). Each job can only run for '''maximum 72h of CPU time''' (so real execution of the program, without the time needed by the system for multitasking, etc) and '''cannot use more than 512MB of memory'''. When it reaches the 72nd hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it. If a job try to use more than 512MB of memory a SIGKILL will terminate it.
 
 
*'''<machine>.long''': only 1 job at a time can run in this queue, but it runs at nice-level 10 (lower priority than the short and the medium ones). The job can only run for '''maximum 168h of CPU time''' (so real execution of the program, without the time needed by the system for multitasking, etc) and '''cannot use more than 512MB of memory'''. If a job reaches the 168th hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it. If a job try to use more than 512MB of memory a SIGKILL will terminate it.
 
 
 
Summarizing: each CPU can run concurrently up to 11 jobs.
 
 
 
'''YOU HAVE TO DESIGN YOUR COMPUTATIONS IN SUCH A WAY THAT EACH SINGLE JOB DOESN'T RUN FOR MORE THAN 7 DAYS (of CPU time) AND/OR DOESN'T USE MORE THAN 512MB OF MEMORY'''.
 
 
== How to submit a job ==
 
 
 
To submit a job that lasts up to 1 day you have to specify -l COMPLEX_NAME -l shorttime in the shell script passed at the qsub command, like in this example:
 
 
#!/bin/bash
 
#$ -N name_of_the_short_job
 
#$ -l complex_name
 
#$ -l shorttime
 
#$ -cwd
 
 
 
To submit a job that lasts up to 3 days you have to specify -l COMPLEX_NAME -l mediumtime in the shell script passed at the qsub command, like in this example:
 
 
#!/bin/bash
 
#$ -N name_of_the_medium_job
 
#$ -l complex_name
 
#$ -l mediumtime
 
#$ -cwd
 
 
 
To submit a job that lasts up to 7 days you have to specify -l COMPLEX_NAME -l longtime in the shell script passed at the qsub command, like in this example:
 
 
#!/bin/bash
 
#$ -N name_of_the_long_job
 
#$ -l complex_name
 
#$ -l longtime
 
#$ -cwd
 
 
 
'''THE SCHEDULER CANNOT PUT IN EXECUTION MORE THAN 64 JOBS OF THE SAME USER AT THE SAME TIME. IF YOU SUBMIT MORE THAN 64 JOBS, MAXIMUM 64 WILL BE RUNNING AT THE SAME TIME'''.
 

Latest revision as of 09:21, 8 August 2012