spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <>
Subject Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
Date Sun, 03 Mar 2019 18:20:35 GMT
Great points Sean.

Here’s what I’d like to suggest to move forward.
Split the SPIP.

If we want to propose upfront homogeneous allocation (aka spark.task.gpus), this should be
one on its own and for instance, I really agree with Sean (like I did in the discuss thread)
that we can’t simply non-goal Mesos. We have enough maintenance issue as it is. And IIRC
there was a PR proposed for K8S that I’d like to see bring that discussion here as well.

IMO upfront allocation is less useful. Specifically too expensive for large jobs.

If we want per-stage resource request, this should a full SPIP with a lot more details to
be hashed out. Our work with Horovod brings a few specific and critical requirements on how
this should work with distributed DL and I would like to see those addressed.

In any case I’d like to see more consensus before moving forward, until then I’m going
to -1 this.

From: Sean Owen <>
Sent: Sunday, March 3, 2019 8:15 AM
To: Felix Cheung
Cc: Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

I'm for this in general, at least a +0. I do think this has to have a
story for what to do with the existing Mesos GPU support, which sounds
entirely like the spark.task.gpus config here. Maybe it's just a
synonym? that kind of thing.

Requesting different types of GPUs might be a bridge too far, but,
that's a P2 detail that can be hashed out later. (For example, if a
v100 is available and k80 was requested, do you use it or fail? is the
right level of resource control GPU RAM and cores?)

The per-stage resource requirements sounds like the biggest change;
you can even change CPU cores requested per pandas UDF? and what about
memory then? We'll see how that shakes out. That's the only thing I'm
kind of unsure about in this proposal.

On Sat, Mar 2, 2019 at 9:35 PM Felix Cheung <> wrote:
> I’m very hesitant with this.
> I don’t want to vote -1, because I personally think it’s important to do, but I’d
like to see more discussion points addressed and not voting completely on the spirit of it.
> First, SPIP doesn’t match the format of SPIP proposed and agreed on. (Maybe this is
a minor point and perhaps we should also vote to update the SPIP format)
> Second, there are multiple pdf/google doc and JIRA. And I think for example the design
sketch is not covering the same points as the updated SPIP doc? It would help to make them
align before moving forward.
> Third, the proposal touches on some fairly core and sensitive components, like the scheduler,
and I think more discussions are necessary. We have a few comments there and in the JIRA.
> ________________________________
> From: Marco Gaido <>
> Sent: Saturday, March 2, 2019 4:18 AM
> To: Weichen Xu
> Cc: Yinan Li; Tom Graves; dev; Xingbo Jiang
> Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
> +1, a critical feature for AI/DL!
> Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu <>
ha scritto:
>> +1, nice feature!
>> On Sat, Mar 2, 2019 at 6:11 AM Yinan Li <> wrote:
>>> +1
>>> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves <>
>>>> +1 for the SPIP.
>>>> Tom
>>>> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang <>
>>>> Hi all,
>>>> I want to call for a vote of SPARK-24615. It improves Spark by making it
aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user
task requests properly. The proposal and production doc was made available on dev@ to collect
input. Your can also find a design sketch at SPARK-27005.
>>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>> +1: Yeah, let's go forward and implement the SPIP.
>>>> +0: Don't really care.
>>>> -1: I don't think this is a good idea because of the following technical
>>>> Thank you!
>>>> Xingbo

View raw message