spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <im...@therashids.com>
Subject Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
Date Wed, 20 Mar 2019 18:01:02 GMT
Thanks for sending the updated docs.  Can you please give everyone the
ability to comment?  I have some comments, but overall I think this is a
good proposal and addresses my prior concerns.

My only real concern is that I notice some mention of "must dos" for spark
3.0.  I don't want to make any commitment to holding spark 3.0 for parts of
this, I think that is an entirely separate decision.  However I'm guessing
this is just a minor wording issue, and you really mean that's a minimal
set of features you are aiming for, which is reasonable.

On Mon, Mar 18, 2019 at 12:56 PM Xingbo Jiang <jiangxb1987@gmail.com> wrote:

> Hi all,
>
> I updated the SPIP doc
> <https://docs.google.com/document/d/1C4J_BPOcSCJc58HL7JfHtIzHrjU0rLRdQM3y7ejil64/edit#>
> and stories
> <https://docs.google.com/document/d/12JjloksHCdslMXhdVZ3xY5l1Nde3HRhIrqvzGnK_bNE/edit#heading=h.udyua28eu3sg>,
> I hope it now contains clear scope of the changes and enough details for
> SPIP vote.
> Please review the updated docs, thanks!
>
> Xiangrui Meng <mengxr@gmail.com> 于2019年3月6日周三 上午8:35写道:
>
>> How about letting Xingbo make a major revision to the SPIP doc to make it
>> clear what proposed are? I like Felix's suggestion to switch to the new
>> Heilmeier template, which helps clarify what are proposed and what are not.
>> Then let's review the new SPIP and resume the vote.
>>
>> On Tue, Mar 5, 2019 at 7:54 AM Imran Rashid <imran@therashids.com> wrote:
>>
>>> OK, I suppose then we are getting bogged down into what a vote on an
>>> SPIP means then anyway, which I guess we can set aside for now.  With the
>>> level of detail in this proposal, I feel like there is a reasonable chance
>>> I'd still -1 the design or implementation.
>>>
>>> And the other thing you're implicitly asking the community for is to
>>> prioritize this feature for continued review and maintenance.  There is
>>> already work to be done in things like making barrier mode support dynamic
>>> allocation (SPARK-24942), bugs in failure handling (eg. SPARK-25250), and
>>> general efficiency of failure handling (eg. SPARK-25341, SPARK-20178).  I'm
>>> very concerned about getting spread too thin.
>>>
>>
>>> But if this is really just a vote on (1) is better gpu support important
>>> for spark, in some form, in some release? and (2) is it *possible* to do
>>> this in a safe way?  then I will vote +0.
>>>
>>> On Tue, Mar 5, 2019 at 8:25 AM Tom Graves <tgraves_cs@yahoo.com> wrote:
>>>
>>>> So to me most of the questions here are implementation/design
>>>> questions, I've had this issue in the past with SPIP's where I expected to
>>>> have more high level design details but was basically told that belongs in
>>>> the design jira follow on. This makes me think we need to revisit what a
>>>> SPIP really need to contain, which should be done in a separate thread.
>>>> Note personally I would be for having more high level details in it.
>>>> But the way I read our documentation on a SPIP right now that detail is
>>>> all optional, now maybe we could argue its based on what reviewers request,
>>>> but really perhaps we should make the wording of that more required.
>>>>  thoughts?  We should probably separate that discussion if people want to
>>>> talk about that.
>>>>
>>>> For this SPIP in particular the reason I +1 it is because it came down
>>>> to 2 questions:
>>>>
>>>> 1) do I think spark should support this -> my answer is yes, I think
>>>> this would improve spark, users have been requesting both better GPUs
>>>> support and support for controlling container requests at a finer
>>>> granularity for a while.  If spark doesn't support this then users may go
>>>> to something else, so I think it we should support it
>>>>
>>>> 2) do I think its possible to design and implement it without causing
>>>> large instabilities?   My opinion here again is yes. I agree with Imran and
>>>> others that the scheduler piece needs to be looked at very closely as we
>>>> have had a lot of issues there and that is why I was asking for more
>>>> details in the design jira:
>>>> https://issues.apache.org/jira/browse/SPARK-27005.  But I do believe
>>>> its possible to do.
>>>>
>>>> If others have reservations on similar questions then I think we should
>>>> resolve here or take the discussion of what a SPIP is to a different thread
>>>> and then come back to this, thoughts?
>>>>
>>>> Note there is a high level design for at least the core piece, which is
>>>> what people seem concerned with, already so including it in the SPIP should
>>>> be straight forward.
>>>>
>>>> Tom
>>>>
>>>> On Monday, March 4, 2019, 2:52:43 PM CST, Imran Rashid <
>>>> imran@therashids.com> wrote:
>>>>
>>>>
>>>> On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng <mengxr@gmail.com> wrote:
>>>>
>>>> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung <felixcheung_m@hotmail.com>
>>>> wrote:
>>>>
>>>> IMO upfront allocation is less useful. Specifically too expensive for
>>>> large jobs.
>>>>
>>>>
>>>> This is also an API/design discussion.
>>>>
>>>>
>>>> I agree with Felix -- this is more than just an API question.  It has a
>>>> huge impact on the complexity of what you're proposing.  You might be
>>>> proposing big changes to a core and brittle part of spark, which is already
>>>> short of experts.
>>>>
>>>> I don't see any value in having a vote on "does feature X sound cool?"
>>>> We have to evaluate the potential benefit against the risks the feature
>>>> brings and the continued maintenance cost.  We don't need super low-level
>>>> details, but we have to a sketch of the design to be able to make that
>>>> tradeoff.
>>>>
>>>

Mime
View raw message