Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[farmshare-discuss] Barley sharing algorithm?

J. Brannon Gary jbgary at stanford.edu
Fri Aug 10 10:58:26 PDT 2012


I'm not exactly sure how the fairshare plan would work.  I often submit jobs through scripting so it is easy to submit large numbers of jobs at one time (100-600 multiple processor jobs - and the job completion times can range from 1-18 hours).  I can understand Mike's frustrations with jobs waiting in line.  The most likely implementation of fairshare would be to limit the maximum number of processors/jobs one can have running simultaneously; however this could be inefficient since at times I have seen low levels of use in the last month and much of the system would not be used at these times.  Frankly the high number of queued jobs has been an issue only the past week or so. Would it be possible instead for a users jobs in the queue to drop in priority status once they have attained some percentage of the clusters capacity (probably in the 30-50% range)?  The large percentage would allow people to finish large data sets in a timely manner while maintaining avalibility for other users to run jobs.  That way, people scripting or using other methods of submitting large numbers of jobs would be able to maximize efficiency of the system in low usage times while not preventing others from doing their work as well.  I understand this may be more difficult to implement but I wanted to share my thoughts on this idea.

I look forward to other feedback from others on this idea.

Brannon

J. Brannon Gary, Ph.D.
NIH Postdoctoral Fellow
Stack Lab
903-312-3820
Department of Chemistry
Stanford University
Mailbox #132
Mudd Building, Room 121
333 Campus Drive
Stanford, CA 94305-5080


----- Original Message -----
From: "Michael Maxwell Murray" <mmurray1 at stanford.edu>
To: "Open discussion for users of FarmShare" <farmshare-discuss at lists.stanford.edu>
Sent: Friday, August 10, 2012 10:31:08 AM
Subject: Re: [farmshare-discuss] Barley sharing algorithm?

Hello Alex,

Another data point on why fairshare is needed. I know you have lots to 
do, so I can accept that fairshare won't be implemented for a couple
more weeks. However, I wanted to provide a succinct example of why
fairshare is needed, so that the implementation is not delayed.
After reading this, can you get some assurance from your management
as to a firm date when fairshare can be implemented?

Right now, user rpinho has 348 jobs running.

corn05: qstat -u '*' | grep ' r ' | grep rpinho | wc
    348    3132   38628

Other users are running 20 jobs:

corn05: qstat -u '*' | grep ' r ' | grep -v rpinho | wc
     20     182    2212

I have jobs that I would like to run that have been queued for a day:
>From qstat:

 326846 0.75164 XX         mmurray1     qw    08/09/2012 10:20:14                                    1 1-196:1

Also, user tflanzer has 66 jobs that have been queued for 18 hours. Yet the most recent 
jobs to start belong to rpinho:

 326816 0.75165 m3r05p2118 rpinho       r     08/10/2012 08:45:04 precise.q at barley17.stanford.ed     1        
 326817 0.75165 m3r05p2119 rpinho       r     08/10/2012 09:21:34 precise.q at barley01.stanford.ed     1        
 326818 0.75165 m3r05p2120 rpinho       r     08/10/2012 09:30:04 precise.q at barley06.stanford.ed     1        
 326819 0.75165 m3r05p2121 rpinho       r     08/10/2012 09:49:34 precise.q at barley11.stanford.ed     1    

Note that only 5 jobs have started in the last hour and a half, i.e. the queue is clearing slowly.
Furthermore, rpihno has 25 more jobs in the queue that look like they will start ahead of
other users. (To be fair, at 4:00 p.m. this afternoon, rpinhos older jobs will start
to hit the 2 day limit, and the queue will start to clear faster.) 

It seems to me that when a slot becomes available, SGE should be starting jobs belonging to
other users besides rpinho. Also, I have no problem with rpinho wanting to run a lot of jobs.
It's just that I would like to see the computing power allocated more evenly.

Thank you,
Mike Murray
Ph.D. Candidate
Civil and Environmental Engineering





corn05:/mnt/glusterfs/mmurray1/PhD/evpp/port> qstat -u '*' | grep ' r ' | grep -v rpinho | wc
----- Original Message -----
From: "Alex Chekholko" <chekh at stanford.edu>
To: farmshare-discuss at lists.stanford.edu
Sent: Monday, August 6, 2012 10:38:08 AM
Subject: Re: [farmshare-discuss] Barley sharing algorithm?

Hi all,

Thank you for your suggestions.  We will implement basic fairshare in a 
couple of weeks and send an announcement.

Regards,
Alex

On 8/3/12 9:36 AM, Tomas Babak wrote:
> I agree, it would be great to implement fairshare that prioritizes jobs
> starting based on the USER CPU usage rather than just when the job was
> submitted - especially given the heavy usage of barley. This does not seem
> to be happening yet but I think was a planned implementation?
>
> Tomas
>
>
>
>
>
>
> -----Original Message-----
> From: farmshare-discuss-bounces at lists.stanford.edu
> [mailto:farmshare-discuss-bounces at lists.stanford.edu] On Behalf Of Michael
> Maxwell Murray
> Sent: Friday, August 03, 2012 9:23 AM
> To: Open discussion for users of FarmShare
> Subject: [farmshare-discuss] Barley sharing algorithm?
>
> Hello,
>
> Can someone explain the algorithm the Barley's used to allocate
> slots to users? Currently, there are 444 jobs running. 413
> jobs belong to the user ocarja, including the most recently
> launched jobs. Several other users have jobs that have been
> queued for more than a day (e.g. blhuynh, rpinho, tflanzer)
>
> Given that ocarja's jobs are consuming a large fraction for
> the CPUs and that there are other users waiting for a significant
> time, when a slot becomes available, why isn't a different user's
> job started?
>
> Thank you,
> Mike Murray
> Ph.D. Candidate
> Civil and Environmental Engineering
> _______________________________________________
> farmshare-discuss mailing list
> farmshare-discuss at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
>
> _______________________________________________
> farmshare-discuss mailing list
> farmshare-discuss at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
>

-- 
Alex Chekholko chekh at stanford.edu 347-401-4860
_______________________________________________
farmshare-discuss mailing list
farmshare-discuss at lists.stanford.edu
https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss
_______________________________________________
farmshare-discuss mailing list
farmshare-discuss at lists.stanford.edu
https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss



More information about the farmshare-discuss mailing list