Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[farmshare-discuss] USR1 -> aborted Matlab job

David Melbourne Blum dmblum at stanford.edu
Mon Sep 24 13:37:09 PDT 2012


Hi folks,

I got a notification that a Matlab job of mine was aborted because signal USR1 was received. I checked the job's error log, and no error was generated. When I googled USR1, what I read made it sound like this signal is used by various linux programs to trigger various different processes. 

The job had been running for exactly 48 hrs and 2 seconds when it aborted, which leads me to wonder whether the job "timed out" after 48 hrs. Does anyone know if qsub / Barley is set to time out Matlab jobs after 48 hrs? If so, is there a way to request a longer block of time, and can someone tell me what I should add to my script to request the time? I believe this job would have finished in another 12-24 hrs, so I'd like to request a total of 72 hrs. Or, if the timing of the abort is entirely coincidental, does anyone know how qsub uses USR1 and what could have tiggered it?

Many thanks,
David

David Blum
Ph.D candidate, Decision and Risk Analysis Group
Department of Management Science & Engineering
Predoctoral Science Fellow
Center for International Security and Cooperation
Stanford University

510-414-4450 (m)
415-230-0645 (skype)
815-301-3500 (fax)
dmblum at stanford.edu
http://cisac.stanford.edu/people/davidblum/




More information about the farmshare-discuss mailing list