spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Rovner <alex.rov...@magnetic.com>
Subject Re: Why dataframe.persist(StorageLevels.MEMORY_AND_DISK_SER) hangs for long time?
Date Sun, 11 Oct 2015 01:24:41 GMT
How many executors are you running with? How many nodes in your cluster?

On Thursday, October 8, 2015, unk1102 <umesh.kacha@gmail.com> wrote:

> Hi as recommended I am caching my Spark job dataframe as
> dataframe.persist(StorageLevels.MEMORY_AND_DISK_SER) but what I see in
> Spark
> job UI is this persist stage runs for so long showing 10 GB of shuffle read
> and 5 GB of shuffle write it takes to long to finish and because of that
> sometimes my Spark job throws timeout or throws OOM and hence executors
> gets
> killed by YARN. I am using Spark 1.4.1. I am using all sort of
> optimizations
> like Tungsten, Kryo I have given storage.memoryFraction as 0.2 and
> storage.shuffle as 0.2 also. My data is huge around 1 TB I am using default
> 200 partitions for spark.sql.shuffle.partitions. Please help me I am
> clueless please guide.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Why-dataframe-persist-StorageLevels-MEMORY-AND-DISK-SER-hangs-for-long-time-tp24981.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <javascript:;>
> For additional commands, e-mail: user-help@spark.apache.org <javascript:;>
>
>

-- 
*Alex Rovner*
*Director, Data Engineering *
*o:* 646.759.0052

* <http://www.magnetic.com/>*

Mime
View raw message