spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Mehra <>
Subject Re: Limit Spark Shuffle Disk Usage
Date Tue, 16 Jun 2015 09:49:07 GMT
Hi Al M,

You should try proving more main memory to shuffle process and it might
reduce spill on disk. The default configuration for shuffle memory fraction
is 20% of the safe memory that means 16% of the overall heap memory. so when
we set executor memory only a small fraction of it is used in the shuffle
process which induces more n more spillage on disk but great thing here, we
can actually change that fraction and provide more memory to shuffle you
just need to set two properties: 

1 : set '' to 0.4 which is by default 0.6

2 : set 'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2

this should make a significant difference in disk use of shuffle.

Thank you

Himanshu Mehra

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message