spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Lavocat <Thomas.Lavo...@univ-grenoble-alpes.fr>
Subject Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow
Date Thu, 05 Jul 2018 06:48:53 GMT
Excerpts from Prem Sure's message of 2018-07-04 19:39:29 +0530:
> Hoping below would help in clearing some..
> executors dont have control to share the data among themselves except
> sharing accumulators via driver's support.
> Its all based on the data locality or remote nature, tasks/stages are
> defined to perform which may result in shuffle.

If I understand correctly :

* Only shuffle data goes through the driver
* The receivers data stays node local until a shuffle occurs

Is that right ?

> On Wed, Jul 4, 2018 at 1:56 PM, thomas lavocat <
> thomas.lavocat@univ-grenoble-alpes.fr> wrote:
> 
> > Hello,
> >
> > I have a question on Spark Dataflow. If I understand correctly, all
> > received data is sent from the executor to the driver of the application
> > prior to task creation.
> >
> > Then the task embeding the data transit from the driver to the executor in
> > order to be processed.
> >
> > As executor cannot exchange data themselves, in a shuffle, data also
> > transit to the driver.
> >
> > Is that correct ?
> >
> > Thomas
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message