spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jasleen Kaur <jasleenkaur1...@gmail.com>
Subject Writing to HDFS
Date Mon, 03 Aug 2015 19:49:17 GMT
I am executing a spark job on a cluster as a yarn-client(Yarn cluster not
an option due to permission issues).

   - num-executors 800
   - spark.akka.frameSize=1024
   - spark.default.parallelism=25600
   - driver-memory=4G
   - executor-memory=32G.
   - My input size is around 1.5TB.

My problem is when I execute rdd.saveAsTextFile(outputPath,
classOf[org.apache.hadoop.io.compress.SnappyCodec])(Saving as avro also not
an option, I have tried saveAsSequenceFile with GZIP,
saveAsNewAPIHadoopFile with same result), I get heap space issue. On the
other hand if I execute rdd.take(1). I get no such issue. So I am assuming
that issue is due to write.

Mime
View raw message