spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuelin Cao.2015" <xuelincao2...@gmail.com>
Subject Re: When will spark support "push" style shuffle?
Date Thu, 08 Jan 2015 06:25:43 GMT
Got it. The explain makes sense. Thank you.


On Thu, Jan 8, 2015 at 1:06 PM, Patrick Wendell [via Apache Spark
Developers List] <ml-node+s1001551n10029h8@n3.nabble.com> wrote:

> This question is conflating a few different concepts. I think the main
> question is whether Spark will have a shuffle implementation that
> streams data rather than persisting it to disk/cache as a buffer.
> Spark currently decouples the shuffle write from the read using
> disk/OS cache as a buffer. The two benefits of this approach this are
> that it allows intra-query fault tolerance and it makes it easier to
> elastically scale and reschedule work within a job. We consider these
> to be design requirements (think about jobs that run for several hours
> on hundreds of machines). Impala, and similar systems like dremel and
> f1, not offer fault tolerance within a query at present. They also
> require gang scheduling the entire set of resources that will exist
> for the duration of a query.
>
> A secondary question is whether our shuffle should have a barrier or
> not. Spark's shuffle currently has a hard barrier between map and
> reduce stages. We haven't seen really strong evidence that removing
> the barrier is a net win. It can help the performance of a single job
> (modestly), but in the a multi-tenant workload, it leads to poor
> utilization since you have a lot of reduce tasks that are taking up
> slots waiting for mappers to finish. Many large scale users of
> Map/Reduce disable this feature in production clusters for that
> reason. Thus, we haven't seen compelling evidence for removing the
> barrier at this point, given the complexity of doing so.
>
> It is possible that future versions of Spark will support push-based
> shuffles, potentially in a mode that remove some of Spark's fault
> tolerance properties. But there are many other things we can still
> optimize about the shuffle that would likely come before this.
>
> - Patrick
>
> On Wed, Jan 7, 2015 at 6:01 PM, 曹雪林 <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10029&i=0>> wrote:
>
> > Hi,
> >
> >       I've heard a lot of complain about spark's "pull" style shuffle.
> Is
> > there any plan to support "push" style shuffle in the near future?
> >
> >       Currently, the shuffle phase must be completed before the next
> stage
> > starts. While, it is said, in Impala, the shuffled data is "streamed" to
> > the next stage handler, which greatly saves time. Will spark support
> this
> > mechanism one day?
> >
> > Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10029&i=1>
> For additional commands, e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10029&i=2>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-When-will-spark-support-push-style-shuffle-tp10028p10029.html
>  To start a new topic under Apache Spark Developers List, email
> ml-node+s1001551n1h40@n3.nabble.com
> To unsubscribe from Apache Spark Developers List, click here
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=eHVlbGluY2FvMjAxNEBnbWFpbC5jb218MXwtOTc3NDY2MzAy>
> .
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-When-will-spark-support-push-style-shuffle-tp10028p10031.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message