Difference between revisions of "Using the IRIDIA Cluster"
(→Queues) |
|||
Line 48: | Line 48: | ||
− | *'''<machine>.short''': max 6 jobs concurrently can run in this queue. Each job can only run for |
+ | *'''<machine>.short''': max 6 jobs concurrently can run in this queue. Each job can only run for '''maximum 24h of CPU time''' (so real execution of the program, without the time needed by the system for multitasking, etc). When it reaches the 24th hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it. |
− | *'''<machine>.medium''': max 4 jobs concurrently can run in this queue, but they run at nice-level 5 (lower priority than the short ones). Each job can only run for |
+ | *'''<machine>.medium''': max 4 jobs concurrently can run in this queue, but they run at nice-level 5 (lower priority than the short ones). Each job can only run for '''maximum 72h of CPU time''' (so real execution of the program, without the time needed by the system for multitasking, etc). When it reaches the 72nd hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it. |
− | *'''<machine>.long''': only 1 job at a time can run in this queue, but it runs at nice-level 10 (lower priority than the short and the medium ones). The job can only run for |
+ | *'''<machine>.long''': only 1 job at a time can run in this queue, but it runs at nice-level 10 (lower priority than the short and the medium ones). The job can only run for '''maximum 168h of CPU time''' (so real execution of the program, without the time needed by the system for multitasking, etc). When it reaches the 168th hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it. |
YOU HAVE TO DESIGN YOUR COMPUTATIONS IN SUCH A WAY THAT THEY DON'T BLOCK A CPU FOR MORE THAN 7 DAYS IN A ROW (of CPU time)!!!! |
YOU HAVE TO DESIGN YOUR COMPUTATIONS IN SUCH A WAY THAT THEY DON'T BLOCK A CPU FOR MORE THAN 7 DAYS IN A ROW (of CPU time)!!!! |
||
− | |||
== How to submit a job == |
== How to submit a job == |
Revision as of 10:59, 20 September 2005
Cluster composition
COMPLEX_NAME: athlon1400
- AMD Athlon 1400 (1 CPU @ 1,36GHz)
p15
COMPLEX_NAME: athlon1800
- AMD Athlon 1800+ (1 CPU @ 1,46GHz)
p06
COMPLEX_NAME: athlon2200
- AMD Athlon 2200+ (1 CPU @ 1,75GHz)
p07
COMPLEX_NAME: athlon2400
- AMD Athlon 2400+ (1 CPU @ 1,95GHz)
p02, p03, p04, p05, p08
COMPLEX_NAME: athlon2800
- AMD Athlon 2800+ (1 CPU @ 2,03GHz)
p17, p18, p19, p20, p21, p22
COMPLEX_NAME: opteron244
- AMD Opteron244 (2 CPU @ 1,75GHz)
r02, r03, r04, r05, r06, r07, r08, r09, r10, r11, r12, r13, r14, r15, r16, r17, r18, r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30, r31, r32, r33
Queues
Each machine has the following 3 queues:
- <machine>.short: max 6 jobs concurrently can run in this queue. Each job can only run for maximum 24h of CPU time (so real execution of the program, without the time needed by the system for multitasking, etc). When it reaches the 24th hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it.
- <machine>.medium: max 4 jobs concurrently can run in this queue, but they run at nice-level 5 (lower priority than the short ones). Each job can only run for maximum 72h of CPU time (so real execution of the program, without the time needed by the system for multitasking, etc). When it reaches the 72nd hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it.
- <machine>.long: only 1 job at a time can run in this queue, but it runs at nice-level 10 (lower priority than the short and the medium ones). The job can only run for maximum 168h of CPU time (so real execution of the program, without the time needed by the system for multitasking, etc). When it reaches the 168th hour it will receive a signal SIGUSR1 and after some more time a SIGKILL that will terminate it.
YOU HAVE TO DESIGN YOUR COMPUTATIONS IN SUCH A WAY THAT THEY DON'T BLOCK A CPU FOR MORE THAN 7 DAYS IN A ROW (of CPU time)!!!!
How to submit a job
To submit a job that lasts up to 1 day you have to specify -l COMPLEX_NAME -l shorttime in the shell script passed at the qsub command, like in this example:
#!/bin/bash #$ -N name_of_the_short_job #$ -l complex_name #$ -l shorttime #$ -cwd
To submit a job that lasts up to 3 days you have to specify -l COMPLEX_NAME -l mediumtime in the shell script passed at the qsub command, like in this example:
#!/bin/bash #$ -N name_of_the_medium_job #$ -l complex_name #$ -l mediumtime #$ -cwd
To submit a job that lasts up to 7 days you have to specify -l COMPLEX_NAME -l longtime in the shell script passed at the qsub command, like in this example:
#!/bin/bash #$ -N name_of_the_long_job #$ -l complex_name #$ -l longtime #$ -cwd