spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <atan...@adobe.com>
Subject Re: Shuffle Write v/s Shuffle Read
Date Fri, 02 Oct 2015 11:26:03 GMT
I’m not sure this is related to memory management – the shuffle is the central act of moving
data around nodes when the computations need the data on another node (E.g. Group by, sort,
etc)

Shuffle read and shuffle write should be mirrored on the left/right side of a shuffle between
2 stages.

-adrian

From: Kartik Mathur
Date: Thursday, October 1, 2015 at 10:36 PM
To: user
Subject: Shuffle Write v/s Shuffle Read

Hi

I am trying to better understand shuffle in spark .

Based on my understanding thus far ,

Shuffle Write : writes stage output for intermediate stage on local disk if memory is not
sufficient.,
Example , if each worker has 200 MB memory for intermediate results and the results are 300MB
then , each executer will keep 200 MB in memory and will write remaining 100 MB on local disk
.

Shuffle Read : Each executer will read from other executer's memory + disk , so total read
in above case will be 300(200 from memory and 100 from disk)*num of executers ?

Is my understanding correct ?

Thanks,
Kartik

Mime
View raw message