spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Job priority
Date Mon, 12 Jan 2015 04:34:24 GMT
Priority scheduling isn't something we've supported in Spark and we've
opted to support FIFO and Fair scheduling and asked users to try and
fit these to the needs of their applications.

In practice from what I've seen of priority schedulers, such as the
linux CPU scheduler, is that strict priority scheduling is never used
in practice because of priority starvation and other issues. So you
have this second tier of heuristics that exist to deal with issues
like starvation, priority inversion, etc, and these become very
complex over time.

That said, I looked a this a bit with @kayousterhout and I don't think
it would be very hard to implement a simple priority scheduler in the
current architecture. My main concern would be additional complexity
that would develop over time, based on looking at previous
implementations in the wild.

Alessandro, would you be able to open a JIRA and list some of your
requirements there? That way we could hear whether other people have
similar needs.

- Patrick

On Sun, Jan 11, 2015 at 10:07 AM, Mark Hamstra <mark@clearstorydata.com> wrote:
> Yes, if you are asking about developing a new priority queue job scheduling
> feature and not just about how job scheduling currently works in Spark, the
> that's a dev list issue.  The current job scheduling priority is at the
> granularity of pools containing jobs, not the jobs themselves; so if you
> require strictly job-level priority queuing, that would require a new
> development effort -- and one that I expect will involve a lot of tricky
> corner cases.
>
> Sorry for misreading the nature of your initial inquiry.
>
> On Sun, Jan 11, 2015 at 7:36 AM, Alessandro Baretta <alexbaretta@gmail.com>
> wrote:
>
>> Cody,
>>
>> While I might be able to improve the scheduling of my jobs by using a few
>> different pools with weights equal to, say, 1, 1e3 and 1e6, effectively
>> getting a small handful of priority classes. Still, this is really not
>> quite what I am describing. This is why my original post was on the dev
>> list. Let me then ask if there is any interest in having priority queue job
>> scheduling in Spark. This is something I might be able to pull off.
>>
>> Alex
>>
>> On Sun, Jan 11, 2015 at 6:21 AM, Cody Koeninger <cody@koeninger.org>
>> wrote:
>>
>>> If you set up a number of pools equal to the number of different priority
>>> levels you want, make the relative weights of those pools very different,
>>> and submit a job to the pool representing its priority, I think youll get
>>> behavior equivalent to a priority queue. Try it and see.
>>>
>>> If I'm misunderstandng what youre trying to do, then I don't know.
>>>
>>>
>>> On Sunday, January 11, 2015, Alessandro Baretta <alexbaretta@gmail.com>
>>> wrote:
>>>
>>>> Cody,
>>>>
>>>> Maybe I'm not getting this, but it doesn't look like this page is
>>>> describing a priority queue scheduling policy. What this section discusses
>>>> is how resources are shared between queues. A weight-1000 pool will get
>>>> 1000 times more resources allocated to it than a priority 1 queue. Great,
>>>> but not what I want. I want to be able to define an Ordering on make my
>>>> tasks representing their priority, and have Spark allocate all resources
to
>>>> the job that has the highest priority.
>>>>
>>>> Alex
>>>>
>>>> On Sat, Jan 10, 2015 at 10:11 PM, Cody Koeninger <cody@koeninger.org>
>>>> wrote:
>>>>
>>>>>
>>>>> http://spark.apache.org/docs/latest/job-scheduling.html#configuring-pool-properties
>>>>>
>>>>> "Setting a high weight such as 1000 also makes it possible to
>>>>> implement *priority* between pools--in essence, the weight-1000 pool
>>>>> will always get to launch tasks first whenever it has jobs active."
>>>>>
>>>>> On Sat, Jan 10, 2015 at 11:57 PM, Alessandro Baretta <
>>>>> alexbaretta@gmail.com> wrote:
>>>>>
>>>>>> Mark,
>>>>>>
>>>>>> Thanks, but I don't see how this documentation solves my problem.
You
>>>>>> are referring me to documentation of fair scheduling; whereas, I
am asking
>>>>>> about as unfair a scheduling policy as can be: a priority queue.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> On Sat, Jan 10, 2015 at 5:00 PM, Mark Hamstra <mark@clearstorydata.com
>>>>>> > wrote:
>>>>>>
>>>>>>> -dev, +user
>>>>>>>
>>>>>>> http://spark.apache.org/docs/latest/job-scheduling.html
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jan 10, 2015 at 4:40 PM, Alessandro Baretta <
>>>>>>> alexbaretta@gmail.com> wrote:
>>>>>>>
>>>>>>>> Is it possible to specify a priority level for a job, such
that the
>>>>>>>> active
>>>>>>>> jobs might be scheduled in order of priority?
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message