spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Malouf <malouf.g...@gmail.com>
Subject Wrong temp directory when compressing before sending text file to S3
Date Thu, 06 Nov 2014 22:10:21 GMT
We have some data that we are exporting from our HDFS cluster to S3 with
some help from Spark.  The final RDD command we run is:

csvData.saveAsTextFile("s3n://data/mess/2014/11/dump-oct-30-to-nov-5-gzip",
classOf[GzipCodec])


We have our 'spark.local.dir' set to our large ephemeral partition on
each slave (on EC2), but with compression on an intermediate format
seems to be written to /tmp/hadoop-root/s3.  Is this a bug in Spark or
are we missing a configuration property?


It's a problem for us because the root disks on EC2 xls are small (~ 5GB).

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message