spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kartik Mathur <kar...@bluedata.com>
Subject Re: How does shuffle work in spark ?
Date Tue, 20 Oct 2015 17:04:04 GMT
That will depend on what is your transformation , your code snippet might
help .



On Tue, Oct 20, 2015 at 1:53 AM, shahid ashraf <shahid@trialx.com> wrote:

> Hi
>
> Any idea why is 50 GB shuffle read and write for 3.3 gb data
>
> On Mon, Oct 19, 2015 at 11:58 PM, Kartik Mathur <kartik@bluedata.com>
> wrote:
>
>> That sounds like correct shuffle output , in spark map reduce phase is
>> separated by shuffle , in map each executer writes on local disk and in
>> reduce phase reducerS reads data from each executer over the network , so
>> shuffle definitely hurts performance , for more details on spark shuffle
>> phase please read this
>>
>> http://0x0fff.com/spark-architecture-shuffle/
>>
>> Thanks
>> Kartik
>>
>> On Mon, Oct 19, 2015 at 6:54 AM, shahid <shahid@trialx.com> wrote:
>>
>>> @all i did partitionby using default hash partitioner on data
>>> [(1,data)(2,(data),(n,data)]
>>> the total data was approx 3.5 it showed shuffle write 50G and on next
>>> action
>>> e.g count it is showing shuffle read of 50 G. i don't understand this
>>> behaviour and i think the performance is getting slow with so much
>>> shuffle
>>> read on next tranformation operations.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-tp584p25119.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> with Regards
> Shahid Ashraf
>

Mime
View raw message