I have seen same behavior!  I would love to hear an update on this...

Thanks,

Ami

On Thu, Feb 5, 2015 at 8:26 AM, Anubhav Srivastav <anubhav.srivastav@gmail.com> wrote:
Hi Kevin,
We seem to be facing the same problem as well. Were you able to find anything after that? The ticket does not seem to have progressed anywhere.

Regards,
Anubhav

On 5 January 2015 at 10:37, 정재부 <itsjb.jung@samsung.com> wrote:

Sure, here is a ticket. https://issues.apache.org/jira/browse/SPARK-5081

 

------- Original Message -------

Sender : Josh Rosen<rosenville@gmail.com>

Date : 2015-01-05 06:14 (GMT+09:00)

Title : Re: Shuffle write increases in spark 1.2

 

If you have a small reproduction for this issue, can you open a ticket at https://issues.apache.org/jira/browse/SPARK ?



On December 29, 2014 at 7:10:02 PM, Kevin Jung (itsjb.jung@samsung.com) wrote:

Hi all,
The size of shuffle write showing in spark web UI is mush different when I
execute same spark job on same input data(100GB) in both spark 1.1 and spark
1.2.
At the same sortBy stage, the size of shuffle write is 39.7GB in spark 1.1
but 91.0GB in spark 1.2.
I set spark.shuffle.manager option to hash because it's default value is
changed but spark 1.2 writes larger file than spark 1.1.
Can anyone tell me why this happened?

Thanks
Kevin



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org