spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <felixcheun...@hotmail.com>
Subject Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
Date Mon, 04 Mar 2019 07:13:12 GMT
Once again, I’d have to agree with Sean.

Let’s table the meaning of SPIP for another time, say. I think a few of us are trying to
understand what does “accelerator resource aware” mean. As far as I know, no one is discussing
API here. But on google doc, JIRA and on email and off list, I have seen questions, questions
that are greatly concerning, like “oh scheduler is allocating GPU, but how does it affect
memory” and many more, and so I think finer “high level” goals should be defined.




________________________________
From: Sean Owen <srowen@gmail.com>
Sent: Sunday, March 3, 2019 5:24 PM
To: Xiangrui Meng
Cc: Felix Cheung; Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

I think treating SPIPs as this high-level takes away much of the point
of VOTEing on them. I'm not sure that's even what Reynold is
suggesting elsewhere; we're nowhere near discussing APIs here, just
what 'accelerator aware' even generally means. If the scope isn't
specified, what are we trying to bind with a formal VOTE? The worst I
can say is that this doesn't mean much, so the outcome of the vote
doesn't matter. The general ideas seems fine to me and I support
_something_ like this.

I think the subtext concern is that SPIPs become a way to request
cover to make a bunch of decisions separately, later. This is, to some
extent, how it has to work. A small number of interested parties need
to decide the details coherently, not design the whole thing by
committee, with occasional check-ins for feedback. There's a balance
between that, and using the SPIP as a license to go finish a design
and proclaim it later. That's not anyone's bad-faith intention, just
the risk of deferring so much.

Mesos support is not a big deal by itself but a fine illustration of
the point. That seems like a fine question of scope now, even if the
'how' or some of the 'what' can be decided later. I raised an eyebrow
here at the reply that this was already judged out-of-scope: how much
are we on the same page about this being a point to consider feedback?

If one wants to VOTE on more details, then this vote just doesn't
matter much. Is a future step to VOTE on some more detailed design
doc? Then that's what I call a "SPIP" and it's practically just
semantics.


On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng <mengxr@gmail.com> wrote:
>
> Hi Felix,
>
> Just to clarify, we are voting on the SPIP, not the companion scoping doc. What is proposed
and what we are voting on is to make Spark accelerator-aware. The companion scoping doc and
the design sketch are to help demonstrate that what features could be implemented based on
the use cases and dev resources the co-authors are aware of. The exact scoping and design
would require more community involvement, by no means we are finalizing it in this vote thread.
>
> I think copying the goals and non-goals from the companion scoping doc to the SPIP caused
the confusion. As mentioned in the SPIP, we proposed to make two major changes at high level:
>
> At cluster manager level, we update or upgrade cluster managers to include GPU support.
Then we expose user interfaces for Spark to request GPUs from them.
> Within Spark, we update its scheduler to understand available GPUs allocated to executors,
user task requests, and assign GPUs to tasks properly.
>
> We should keep our vote discussion at this level. It doesn't exclude Mesos/Windows/TPU/FPGA,
nor it commits to support YARN/K8s. Through the initial scoping work, we found that we certainly
need domain experts to discuss the support of each cluster manager and each accelerator type.
But adding more details on Mesos or FPGA doesn't change the SPIP at high level. So we concluded
the initial scoping, shared the docs, and started this vote.

Mime
View raw message