spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From slavitch <slavi...@gmail.com>
Subject In-Memory Only Spark Shuffle
Date Fri, 01 Apr 2016 19:13:03 GMT
Hello;

I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle.  Is there a way to force spark to stay
exclusively in memory when doing shuffle operations?   The goal is to keep
the shuffle data either in the heap or in off-heap memory (in 1.6.x) and
never touch the IO subsystem.  I am willing to have the job fail if it runs
out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten
sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this
is ignored by the tungsten-sort shuffle manager; its optimized shuffles will
continue to spill to disk when necessary.”

If this is impossible via configuration changes what code changes would be
needed to accomplish this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/In-Memory-Only-Spark-Shuffle-tp26661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message