Search Mailing List Archives
[farmshare-discuss] termination of rogue processes
bishopj at stanford.edu
Sun Apr 28 17:34:47 PDT 2013
Hi Victor, We have noticed these long running processes. Thank you for killing them. I would suggest you use grid engine and the barley cluster for these kinds of tasks instead of corn. If you submit jobs to grid engine then you don't have to manage when and where processes run, as the scheduler handles all of these tasks for you. Importantly, for the case you cite, grid engine by default will enforce a 48hr runtime limit.
I would suggest using corn systems where you need interactive things like source code editing, interactive matlab, debugging R scripts, etc.
I think the best result for these shared resources is through responsible use rather than any countermeasures we put in place. It is quite difficult for us to determine the difference between forgotten processes versus responsible (albeit computationally intensive) use. The person using the cluster is the best equipped to monitor.
That said, if you can think of something we can do which will help you to use the resource responsibly I would be quite interested to hear any thoughts. It would be relatively easy for us to list your processes and their accumulated cpu time on the farmshare wiki, for example.
----- Original Message -----
From: "Victor Liu" <vkl at stanford.edu>
To: farmshare-discuss at lists.stanford.edu
Sent: Sunday, April 28, 2013 4:55:09 PM
Subject: [farmshare-discuss] termination of rogue processes
I recently found out that I had a number of processes that have been
running on corn for about a month. I only discovered this because a
friend saw that there were processes owned by me with huge CPU times.
These processes were run in the background and I had counted on the 24
hour limit to kill them, so I didn't bother to check up on them. It
seems that the policies on corn will end up revoking the kerberos
priveleges to write to disk, but the process is still left running. I
have gone through and killed all these rogue processes, but I did notice
many other processes on the nodes with similar runtimes of 10's of
thousands of CPU minutes. I wonder if you can make it so that instead of
just revoking priveleges, the process is also killed.
farmshare-discuss mailing list
farmshare-discuss at lists.stanford.edu
More information about the farmshare-discuss