spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Mehra <himanshumehra....@gmail.com>
Subject Re: Limit Spark Shuffle Disk Usage
Date Tue, 16 Jun 2015 09:49:07 GMT
Hi Al M,

You should try proving more main memory to shuffle process and it might
reduce spill on disk. The default configuration for shuffle memory fraction
is 20% of the safe memory that means 16% of the overall heap memory. so when
we set executor memory only a small fraction of it is used in the shuffle
process which induces more n more spillage on disk but great thing here, we
can actually change that fraction and provide more memory to shuffle you
just need to set two properties: 

1 : set 'spark.storage.memoryFraction' to 0.4 which is by default 0.6

2 : set 'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2

this should make a significant difference in disk use of shuffle.

Thank you

-
Himanshu Mehra



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message