Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[farmshare-discuss] Barley sharing algorithm?

Ricardo Pinho rpinho at stanford.edu
Tue Aug 14 16:03:34 PDT 2012


A fair share scheduling policy might not be the only thing missing. I appreciate the need to improve the sharing algorithm and people's urgency in running jobs. But I would also appreciate if users were not put on the spot on this public mailing list. It's extremely rude and unnecessary. The user could have had his point made with exactly the same urgency and agency, as it is his right, without mentioning my name, or even other user names in past postings. Admins may consider moderating. They may also consider accounting/qacct to evaluate user's usage. Thank you for all your work and running the cluster for us.

On Aug 10, 2012, at 10:09 PM, Ruth Marinshaw wrote:

> Good evening. As Alex noted in an earlier message, we will indeed implement a fair share scheduling policy for Barley.  I (the management referenced in someone's note) cannot provide a specific date at this point, as we have other commitments, in progress, to complete.  But I anticipate that this change can be made in the next week or so.  We are all in agreement that it needs to be done and it is the right thing to do, given that this is a shared, limited resource.  Other queue parameter changes (max jobs, etc.) may also be considered.   
> 
> We are confirming whether there are additional system/application changes that need to be made in conjunction with the scheduling policy change and will provide timing details to this list as we have them.  If the timeline is delayed for some reason, we will look at other  interventions to try to address the situation that you all are facing and make job starts/executions more equitable.
> 
> I appreciate your suggestions, feedback and ideas, as we develop strategies to tailor these relatively new, albeit limited, services to better meet the community's computational research needs.  Please contact me directly (ruthm at stanford.edu) if you have additional ideas or needs that you would like to discuss.
> 
> Best regards,
> 
> Ruth Marinshaw
> Research Computing
> 
> 
> 
> 
> ----- Original Message -----
> From: "Michael Maxwell Murray" <mmurray1 at stanford.edu>
> To: "Open discussion for users of FarmShare" <farmshare-discuss at lists.stanford.edu>
> Sent: Friday, August 10, 2012 10:31:08 AM
> Subject: Re: [farmshare-discuss] Barley sharing algorithm?
> 
> Hello Alex,
> 
> Another data point on why fairshare is needed. I know you have lots to 
> do, so I can accept that fairshare won't be implemented for a couple
> more weeks. However, I wanted to provide a succinct example of why
> fairshare is needed, so that the implementation is not delayed.
> After reading this, can you get some assurance from your management
> as to a firm date when fairshare can be implemented?
> 
> Right now, user rpinho has 348 jobs running.
> 
> corn05: qstat -u '*' | grep ' r ' | grep rpinho | wc
>    348    3132   38628
> 
> Other users are running 20 jobs:
> 
> corn05: qstat -u '*' | grep ' r ' | grep -v rpinho | wc
>     20     182    2212
> 
> I have jobs that I would like to run that have been queued for a day:
> From qstat:
> 
> 326846 0.75164 XX         mmurray1     qw    08/09/2012 10:20:14                                    1 1-196:1
> 
> Also, user tflanzer has 66 jobs that have been queued for 18 hours. Yet the most recent 
> jobs to start belong to rpinho:
> 
> 326816 0.75165 m3r05p2118 rpinho       r     08/10/2012 08:45:04 precise.q at barley17.stanford.ed     1        
> 326817 0.75165 m3r05p2119 rpinho       r     08/10/2012 09:21:34 precise.q at barley01.stanford.ed     1        
> 326818 0.75165 m3r05p2120 rpinho       r     08/10/2012 09:30:04 precise.q at barley06.stanford.ed     1        
> 326819 0.75165 m3r05p2121 rpinho       r     08/10/2012 09:49:34 precise.q at barley11.stanford.ed     1    
> 
> Note that only 5 jobs have started in the last hour and a half, i.e. the queue is clearing slowly.
> Furthermore, rpihno has 25 more jobs in the queue that look like they will start ahead of
> other users. (To be fair, at 4:00 p.m. this afternoon, rpinhos older jobs will start
> to hit the 2 day limit, and the queue will start to clear faster.) 
> 
> It seems to me that when a slot becomes available, SGE should be starting jobs belonging to
> other users besides rpinho. Also, I have no problem with rpinho wanting to run a lot of jobs.
> It's just that I would like to see the computing power allocated more evenly.
> 
> Thank you,
> Mike Murray
> Ph.D. Candidate
> Civil and Environmental Engineering
> 
> 
> 
> 
> 
> corn05:/mnt/glusterfs/mmurray1/PhD/evpp/port> qstat -u '*' | grep ' r ' | grep -v rpinho | wc
> ----- Original Message -----
> From: "Alex Chekholko" <chekh at stanford.edu>
> To: farmshare-discuss at lists.stanford.edu
> Sent: Monday, August 6, 2012 10:38:08 AM
> Subject: Re: [farmshare-discuss] Barley sharing algorithm?
> 
> Hi all,
> 
> Thank you for your suggestions.  We will implement basic fairshare in a 
> couple of weeks and send an announcement.
> 
> Regards,
> Alex
> 
> On 8/3/12 9:36 AM, Tomas Babak wrote:
>> I agree, it would be great to implement fairshare that prioritizes jobs
>> starting based on the USER CPU usage rather than just when the job was
>> submitted - especially given the heavy usage of barley. This does not seem
>> to be happening yet but I think was a planned implementation?
>> 
>> Tomas
>> 
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: farmshare-discuss-bounces at lists.stanford.edu
>> [mailto:farmshare-discuss-bounces at lists.stanford.edu] On Behalf Of Michael
>> Maxwell Murray
>> Sent: Friday, August 03, 2012 9:23 AM
>> To: Open discussion for users of FarmShare
>> Subject: [farmshare-discuss] Barley sharing algorithm?
>> 
>> Hello,
>> 
>> Can someone explain the algorithm the Barley's used to allocate
>> slots to users? Currently, there are 444 jobs running. 413
>> jobs belong to the user ocarja, including the most recently
>> launched jobs. Several other users have jobs that have been
>> queued for more than a day (e.g. blhuynh, rpinho, tflanzer)
>> 
>> Given that ocarja's jobs are consuming a large fraction for
>> the CPUs and that there are other users waiting for a significant
>> time, when a slot becomes available, why isn't a different user's
>> job started?
>> 
>> Thank you,
>> Mike Murray
>> Ph.D. Candidate
>> Civil and Environmental Engineering
>> _______________________________________________
>> farmshare-discuss mailing list
>> farmshare-discuss at lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
>> 
>> _______________________________________________
>> farmshare-discuss mailing list
>> farmshare-discuss at lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
>> 
> 
> -- 
> Alex Chekholko chekh at stanford.edu 347-401-4860
> _______________________________________________
> farmshare-discuss mailing list
> farmshare-discuss at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
> _______________________________________________
> farmshare-discuss mailing list
> farmshare-discuss at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
> _______________________________________________
> farmshare-discuss mailing list
> farmshare-discuss at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/farmshare-discuss/attachments/20120814/065babc9/attachment.html>


More information about the farmshare-discuss mailing list