Hi Al M,
You should try proving more main memory to shuffle process and it might
reduce spill on disk. The default configuration for shuffle memory fraction
is 20% of the safe memory that means 16% of the overall heap memory. so when
we set executor memory only a small fraction of it is used in the shuffle
process which induces more n more spillage on disk but great thing here, we
can actually change that fraction and provide more memory to shuffle you
just need to set two properties:
1 : set 'spark.storage.memoryFraction' to 0.4 which is by default 0.6
2 : set 'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2
this should make a significant difference in disk use of shuffle.
Thank you
-
Himanshu Mehra
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
|