spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Slavitch <slavi...@gmail.com>
Subject Eliminating shuffle write and spill disk IO reads/writes in Spark
Date Fri, 01 Apr 2016 18:32:05 GMT
Hello;

I’m working on spark with very large memory systems (2TB+) and notice that Spark spills
to disk in shuffle.  Is there a way to force spark to stay in memory when doing shuffle operations?
  The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x)
and never touch the IO subsystem.  I am willing to have the job fail if it runs out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is ignored by the
tungsten-sort shuffle manager; its optimized shuffles will continue to spill to disk when
necessary.”

If this is impossible via configuration changes what code changes would be needed to accomplish
this?





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message