spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Writing to HDFS
Date Tue, 04 Aug 2015 01:40:17 GMT
Is your data skewed? What happens if you do rdd.count()?
On 4 Aug 2015 05:49, "Jasleen Kaur" <jasleenkaur1291@gmail.com> wrote:

> I am executing a spark job on a cluster as a yarn-client(Yarn cluster not
> an option due to permission issues).
>
>    - num-executors 800
>    - spark.akka.frameSize=1024
>    - spark.default.parallelism=25600
>    - driver-memory=4G
>    - executor-memory=32G.
>    - My input size is around 1.5TB.
>
> My problem is when I execute rdd.saveAsTextFile(outputPath,
> classOf[org.apache.hadoop.io.compress.SnappyCodec])(Saving as avro also not
> an option, I have tried saveAsSequenceFile with GZIP,
> saveAsNewAPIHadoopFile with same result), I get heap space issue. On the
> other hand if I execute rdd.take(1). I get no such issue. So I am assuming
> that issue is due to write.
>

Mime
View raw message