spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From slavitch <>
Subject In-Memory Only Spark Shuffle
Date Fri, 01 Apr 2016 19:13:03 GMT

I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle.  Is there a way to force spark to stay
exclusively in memory when doing shuffle operations?   The goal is to keep
the shuffle data either in the heap or in off-heap memory (in 1.6.x) and
never touch the IO subsystem.  I am willing to have the job fail if it runs
out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten
sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this
is ignored by the tungsten-sort shuffle manager; its optimized shuffles will
continue to spill to disk when necessary.”

If this is impossible via configuration changes what code changes would be
needed to accomplish this?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message