Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[farmshare-discuss] USR1 -> aborted Matlab job

Alex Chekholko chekh at stanford.edu
Mon Sep 24 14:18:43 PDT 2012


Hi David,

There is a default job limit of 48hrs on FarmShare.  You'll want to 
request more time for your jobs.  Use 'qsub -l h_rt=xx:xx:xx ...'

Please feel free to edit the documentation if it is not clear:
https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/User_Guide#job_duration

Actually I'm not exactly sure what happens if you ask for 50hrs 
(h_rt=50:00:00); the job gets placed into the long queue (30 day limit), 
but does the job get killed at 50 hrs or at the 30 day limit?  I think 
it's the latter.

Try it out, let us know.

If you request more time than 30 days, then the job will never run, 
because the scheduler won't be able to satisfy that resource request.

Regards,
Alex

On 09/24/2012 01:37 PM, David Melbourne Blum wrote:
> Hi folks,
>
> I got a notification that a Matlab job of mine was aborted because signal USR1 was received. I checked the job's error log, and no error was generated. When I googled USR1, what I read made it sound like this signal is used by various linux programs to trigger various different processes.
>
> The job had been running for exactly 48 hrs and 2 seconds when it aborted, which leads me to wonder whether the job "timed out" after 48 hrs. Does anyone know if qsub / Barley is set to time out Matlab jobs after 48 hrs? If so, is there a way to request a longer block of time, and can someone tell me what I should add to my script to request the time? I believe this job would have finished in another 12-24 hrs, so I'd like to request a total of 72 hrs. Or, if the timing of the abort is entirely coincidental, does anyone know how qsub uses USR1 and what could have tiggered it?
>
> Many thanks,
> David
>
> David Blum
> Ph.D candidate, Decision and Risk Analysis Group
> Department of Management Science & Engineering
> Predoctoral Science Fellow
> Center for International Security and Cooperation
> Stanford University
>
> 510-414-4450 (m)
> 415-230-0645 (skype)
> 815-301-3500 (fax)
> dmblum at stanford.edu
> http://cisac.stanford.edu/people/davidblum/
>
> _______________________________________________
> farmshare-discuss mailing list
> farmshare-discuss at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
>

-- 
Alex Chekholko chekh at stanford.edu 347-401-4860



More information about the farmshare-discuss mailing list