spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameet Kini <>
Subject Re: Saving compressed sequence files
Date Wed, 28 Aug 2013 13:56:03 GMT

Still stuck on this, so would greatly appreciate any pointers as to how to
force Spark to recognize the mapred.output.compression.type hadoop


On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini <> wrote:

> I'm trying to use saveAsSequenceFile to output compressed sequenced files
> where the "value" in each key,value pair is compressed. In Hadoop, I would
> set this job configuration parameter:
> "mapred.output.compression.type=RECORD" for record level compression.
> Previous posts have suggested that this is possible by simply setting this
> parameter in the core-site.xml. I tried doing just that, and the sequence
> file doesn't seem to be compressed.
> I've also tried doing this by setting
> spark.hadoop.mapred.output.compression.type as a system parameter just
> before initializing the spark context:
> System.setProperty("spark.hadoop.mapred.output.compression.type", "RECORD")
> In both cases, I can see that the resulting configuration as per
> SparkContext.hadoopConfiguration has the property set to RECORD, but the
> resulting sequence file still has its value uncompressed.
> At first, I thought that this is because io.compression.codecs was set to
> null, so I set io.compression.codecs to the long list of codecs that is its
> normal default value in a Hadoop environment, but still to no avail. Am I
> missing a crucial step?
> Thanks,
> Ameet

View raw message