Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[farmshare-discuss] Fw: Jobs Aborting Early in Farmshare 2

Rehman Ali rali8 at stanford.edu
Tue Jan 2 14:05:08 PST 2018




________________________________
From: Rehman Ali
Sent: Tuesday, January 2, 2018 2:04 PM
To: srcc-support
Subject: Jobs Aborting Early in Farmshare 2


Dear SRCC,


On the old Farmshare (corn) I was able to perform parallel executions of some MATLAB code that takes roughly 8 hours per thread.


However, now in Farmshare 2, this new system keep aborting the job at roughly two hours into each thread. The message I get in my error files is this:


slurmstepd-wheat01: error: *** JOB 140407 ON wheat01 CANCELLED AT 2018-01-02T12:19:05 DUE TO TIME LIMIT ***


Does someone know what could be causing this error. How small is the time limit?

Based on this (https://web.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/User_Guide), the maximum runtime should be 2 days, so why do my jobs get canceled after only 2 hours?



Rehman Ali

National Defense Science and Engineering Graduate (NDSEG) Fellow

Electrical Engineering PhD Candidate | Stanford University

Computational and Mathematical Engineering M.S. Student | Stanford University

B.S. Biomedical Engineering, 2016 | Georgia Institute of Technology

Graduate Student Researcher in Jeremy Dahl Ultrasound Lab | Stanford University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/farmshare-discuss/attachments/20180102/0a746a29/attachment.html>


More information about the farmshare-discuss mailing list