Search Mailing List Archives
[farmshare-discuss] Fw: Jobs Aborting Early in Farmshare 2
rali8 at stanford.edu
Tue Jan 2 14:05:08 PST 2018
From: Rehman Ali
Sent: Tuesday, January 2, 2018 2:04 PM
Subject: Jobs Aborting Early in Farmshare 2
On the old Farmshare (corn) I was able to perform parallel executions of some MATLAB code that takes roughly 8 hours per thread.
However, now in Farmshare 2, this new system keep aborting the job at roughly two hours into each thread. The message I get in my error files is this:
slurmstepd-wheat01: error: *** JOB 140407 ON wheat01 CANCELLED AT 2018-01-02T12:19:05 DUE TO TIME LIMIT ***
Does someone know what could be causing this error. How small is the time limit?
Based on this (https://web.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/User_Guide), the maximum runtime should be 2 days, so why do my jobs get canceled after only 2 hours?
National Defense Science and Engineering Graduate (NDSEG) Fellow
Electrical Engineering PhD Candidate | Stanford University
Computational and Mathematical Engineering M.S. Student | Stanford University
B.S. Biomedical Engineering, 2016 | Georgia Institute of Technology
Graduate Student Researcher in Jeremy Dahl Ultrasound Lab | Stanford University
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the farmshare-discuss