spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoltán Zvara <zoltan.zv...@gmail.com>
Subject Re: Shuffle Write v/s Shuffle Read
Date Fri, 02 Oct 2015 11:38:11 GMT
Hi,

Shuffle output goes to local disk each time, as far as I know, never to
memory.

On Fri, Oct 2, 2015 at 1:26 PM Adrian Tanase <atanase@adobe.com> wrote:

> I’m not sure this is related to memory management – the shuffle is the
> central act of moving data around nodes when the computations need the data
> on another node (E.g. Group by, sort, etc)
>
> Shuffle read and shuffle write should be mirrored on the left/right side
> of a shuffle between 2 stages.
>
> -adrian
>
> From: Kartik Mathur
> Date: Thursday, October 1, 2015 at 10:36 PM
> To: user
> Subject: Shuffle Write v/s Shuffle Read
>
> Hi
>
> I am trying to better understand shuffle in spark .
>
> Based on my understanding thus far ,
>
> *Shuffle Write* : writes stage output for intermediate stage on local
> disk if memory is not sufficient.,
> Example , if each worker has 200 MB memory for intermediate results and
> the results are 300MB then , each executer* will keep 200 MB in memory
> and will write remaining 100 MB on local disk .  *
>
> *Shuffle Read : *Each executer will read from other executer's *memory +
> disk , so total read in above case will be 300(200 from memory and 100 from
> disk)*num of executers ? *
>
> Is my understanding correct ?
>
> Thanks,
> Kartik
>

Mime
View raw message