spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Slavitch <>
Subject Eliminating shuffle write and spill disk IO reads/writes in Spark
Date Fri, 01 Apr 2016 18:32:05 GMT

I’m working on spark with very large memory systems (2TB+) and notice that Spark spills
to disk in shuffle.  Is there a way to force spark to stay in memory when doing shuffle operations?
  The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x)
and never touch the IO subsystem.  I am willing to have the job fail if it runs out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is ignored by the
tungsten-sort shuffle manager; its optimized shuffles will continue to spill to disk when

If this is impossible via configuration changes what code changes would be needed to accomplish

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message