spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: Spark writing to disk when there's enough memory?!
Date Tue, 14 Jan 2014 19:47:21 GMT
Hey Majd,

I believe Shark sets up data to spill to disk, even though the default storage level in Spark
is memory-only. In terms of those executors, it looks like data distribution was unbalanced
across them, possibly due to data locality in HDFS (some of the executors may have had more
data). One thing you can do to prevent that is set Spark's data locality delay for disk to
0 (spark.locality.wait.node=0 and spark.locality.wait.rack=0). It will still respect memory
locality but not try to optimize disk locality on HDFS.


On Jan 13, 2014, at 4:24 AM, mharwida <> wrote:

> Hi All,
> I'm creating a cached table in memory via Shark using the command:
> create table tablename_cached as select * from tablename;
> Monitoring this via the Spark UI, I have noticed that data is being written
> to disk when there's clearly enough available memory on 2 of the worker
> nodes. Please refer to attached image. Cass4 and Cass3 have 3GB of available
> memory yet the data is being written to disk on the worker nodes which have
> used all their memory.
> <> 
> <> 
> Could anyone shed a light on this please?
> Thanks
> Majd
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at

View raw message