Talk:Using the IRIDIA Cluster
Discussion on who is supposed to use the cluster, and abuse
- Outcome of admin meeting of Thursday 20th April 2006
- Who can use the cluster
- Only people directly associated with IRIDIA (students, post docs, etc.) can use the cluster. Researchers who left should arrange with their contact/collaborator/supervisor here, to use his/her login in case they need to use the cluster.
- The cluster-policy
- Max explained the cluster policy and a mem limit might be introduced.
- Elio's thoughts :
Dear all, please, take my thoughts in a constructive way. I read on the IRIDIA Wiki page the following statement, which I take as the official position of the Lab: "Only people directly associated with IRIDIA (students, post docs, etc.) can use the cluster. Researchers who left should arrange with their contact/collaborator/supervisor here, to use his/her login in case they need to use the cluster" Taken from: http://iridia.ulb.ac.be/wiki/index.php/Administration_weekly_meetings#Thursday_20th_April_2006 With respect to the above statement I would like to add some comments. In my opinion, there are at least two reasons which may suggest to undertake a different direction to free the cluster from over usage. I believe that, the idea of revealing to "trusted" people the login/password of IRIDIA members is not the most efficient one because: 1) The recent history teaches us that even trusted people can cause problems. We experienced such a case in a not recent past when an IRIDIA member created some trouble to Halva which was maintening the cluster at that time. I don't want to stigmatize the case. I DO believe that that action was more the result of an unexperienced person rather then the result of a real intetion to cause demages. However, it teaches us that even trusted people can cause problems. This is obviously always the case, but if each one as its own login/password it is easier to spot the responsibles, I guess, and sorry if I am wrong on this. 2) If using someone else login/password becomes the official practice for the IRIDIA cluster, as I read in the above statement, then even the trusted people can recursively apply the same practice and in a while we might have more jobs runnig then what we have at the moment. Do we have tools to prevent this from happening? Thanks a lot for your attention. Cheers, Elio
- Alex :
- I agree with what Elio says. A login must be personal, and associated to one person only. If you want to prevent somebody from accessing the cluster, just disable the account. If we trust that person then it is fine, and we can let her access the cluster.
- In general of course, there should be a kind of period during which an account is still usable. At least to retrieve files, to launche the five last replications that you always delayed and so on. Accounts should be simply deleted if that period is over and the owner of the account is no more doing something with IRIDIA. If someone is collaborating with a member of IRIDIA I think it is ok to let him have an account.
- If we disagree with way people use the cluster, then either :
- we find ways to define an efficient policy of cluster usage.
- or we point out the problems and discuss them. It is anyway a matter of fairness.
- If we disagree with way people use the cluster, then either :
- I think that the cluster of IRIDIA is very good and unless everybody wants to make simulations at the same time, it is enough for all of us. Users should design carefully their source code to make it fast. Users should also avoid using the cluster for debugging purpose. You should not submit a set of simulations just to see if your code works. That you can do on your own computer and this way you respect the common resources.
- As it is a common resource, we should also understand that it may not be immediately accessible. So when you forecast running a simulation you should know in advance that you may have to wait 1 day to have your jobs running.
I do believe that everyone should have its own login. Logins are no restricted resources, thus we can have as much as want of them. If you fear that users might misuse the cluster (as the example Elio was referring to), there are ways to tighten the kernel security and the resource usage. If anybody wants to look into the matter, and if she/he HAS time to do it, she/he is welcome! :)
I think also that the current policy is pretty fair. A job gets a lower priority if the user is already running other jobs on the cluster. Thus people with few jobs have higher chances to see their jobs be promptly run. Additionaly, the current policy limits the job duration, forcing the user to think twice before submitting anything.
Is it not enough? One might think to hack the queueing system scheduler to give even lower priority to those how have used the cluster the most in the last period. Or we might consider to buy a professional queueing system, which might give finer control on its mechanism.
Is it still not enough? Then go on polyphemus.ulb.ac.be, issue qacct -o to see who has heavily used the cluster (qacct -o -d 30 to see the statistics of the last 30 days) and punish her/him :)
I believe that, under normal circumnstances, only people affiliated to IRIDIA should have accounts on the cluster. Of course, special agreements to open external accounts can be made, but this is more a political decision of the "bosses" :) There was a plan to "harden" the security of IRIDIA network putting every machine inside an internal network with no direct connection from the outside. I still support that view, the less "access points" we leave open from the outside world, the better, therefore I think that if I collaborate with somebody external to IRIDIA, I should be responsible for the usage of IRIDIA resources, and I should launch the experiments on the cluster with my account. If the sources are provided by the external, or if the results will be analyzed by the external, nobody cares.
- let's exercise a bit in a conspirancy theory: you closed the cluster in a local network; someone gives you some code to run the experiments with your account; the code contains malicious lines that jopardize the cluster => closing the cluster in a local network does not help, and you become the bad guy who hacked the cluster :)--haiax 10:07, 25 April 2006 (CEST)
- yes that's one thing. the other one is that the things you suggest are extremely constraining. I would agree with you if you can come up with integrated solutions, SSH tunnels and similar methods that still allow any authorized user to check in, work remotely and so on. Sometimes one try to make a lot of things to improve security but that mainly makes the users life get worse instead. Anyway in our case, I believe that the very first security action we have to take is backup. Additionally the focus is not security, but fairness of use. The point is that some of us would like that members of IRIDIA have more facilities with the use of the cluster than non members. Acampo 22:13, 25 April 2006 (CEST)
- Halva, you got the idea... if you decide to collaborate with somebody and you accept code from him/her to run on a shared resource in IRIDIA, you are responsible if it does any damage. If you don't trust the guy, either you check his/her code before launching it, or you write the code yourself. Alex, for what concerne the access for users the proposed idea was exactly the one you suggested...the use of an SSH tunnel on the IRIDIA firewall. Max 09:47, 26 April 2006
Rodi's two cents
I would propose to delete the accounts on the cluster 3-6 month after the person left. I think Anders is right when he argues that IRIDIA is probably the only place where people that left years ago have still access by default. Before deleting accounts for the first time, I would propose to backup the material (as the users will be shocked by this sudden change of policy).
Concerning what Halva wrote: this is indeed a nice comment, and I can already see how it will be used to automatically pillories the most elaborate researchers in the context of the "Monthely report on IRIDIA Cluster CPU usage" :-)
Max' comment -- part 2
For what concerne the "fairness" of usage of the shared resources the policy was designed around the instruments we have at the moment. The queueing system we use does not allow a fine-grained control (or if it does we don't know how to do it ;) ): in the configuration panel there is only a generic "max number of jobs a user can concurrently run", but no way of defining classes of users with different numbers. I think that, as Halva suggested, we should consider the possibility of buying a more "powerful" queueing system (maybe even the same one, Sun Grid Engine, but the commercial version that has more features).