spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <>
Subject Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
Date Tue, 05 Mar 2019 15:54:28 GMT
OK, I suppose then we are getting bogged down into what a vote on an SPIP
means then anyway, which I guess we can set aside for now.  With the level
of detail in this proposal, I feel like there is a reasonable chance I'd
still -1 the design or implementation.

And the other thing you're implicitly asking the community for is to
prioritize this feature for continued review and maintenance.  There is
already work to be done in things like making barrier mode support dynamic
allocation (SPARK-24942), bugs in failure handling (eg. SPARK-25250), and
general efficiency of failure handling (eg. SPARK-25341, SPARK-20178).  I'm
very concerned about getting spread too thin.

But if this is really just a vote on (1) is better gpu support important
for spark, in some form, in some release? and (2) is it *possible* to do
this in a safe way?  then I will vote +0.

On Tue, Mar 5, 2019 at 8:25 AM Tom Graves <> wrote:

> So to me most of the questions here are implementation/design questions,
> I've had this issue in the past with SPIP's where I expected to have more
> high level design details but was basically told that belongs in the design
> jira follow on. This makes me think we need to revisit what a SPIP really
> need to contain, which should be done in a separate thread.  Note
> personally I would be for having more high level details in it.
> But the way I read our documentation on a SPIP right now that detail is
> all optional, now maybe we could argue its based on what reviewers request,
> but really perhaps we should make the wording of that more required.
>  thoughts?  We should probably separate that discussion if people want to
> talk about that.
> For this SPIP in particular the reason I +1 it is because it came down to
> 2 questions:
> 1) do I think spark should support this -> my answer is yes, I think this
> would improve spark, users have been requesting both better GPUs support
> and support for controlling container requests at a finer granularity for a
> while.  If spark doesn't support this then users may go to something else,
> so I think it we should support it
> 2) do I think its possible to design and implement it without causing
> large instabilities?   My opinion here again is yes. I agree with Imran and
> others that the scheduler piece needs to be looked at very closely as we
> have had a lot of issues there and that is why I was asking for more
> details in the design jira:
>  But I do believe its
> possible to do.
> If others have reservations on similar questions then I think we should
> resolve here or take the discussion of what a SPIP is to a different thread
> and then come back to this, thoughts?
> Note there is a high level design for at least the core piece, which is
> what people seem concerned with, already so including it in the SPIP should
> be straight forward.
> Tom
> On Monday, March 4, 2019, 2:52:43 PM CST, Imran Rashid <
>> wrote:
> On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng <> wrote:
> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung <>
> wrote:
> IMO upfront allocation is less useful. Specifically too expensive for
> large jobs.
> This is also an API/design discussion.
> I agree with Felix -- this is more than just an API question.  It has a
> huge impact on the complexity of what you're proposing.  You might be
> proposing big changes to a core and brittle part of spark, which is already
> short of experts.
> I don't see any value in having a vote on "does feature X sound cool?"  We
> have to evaluate the potential benefit against the risks the feature brings
> and the continued maintenance cost.  We don't need super low-level details,
> but we have to a sketch of the design to be able to make that tradeoff.

View raw message