spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameet Kini <ameetk...@gmail.com>
Subject Re: Saving compressed sequence files
Date Wed, 28 Aug 2013 13:56:03 GMT
Folks,

Still stuck on this, so would greatly appreciate any pointers as to how to
force Spark to recognize the mapred.output.compression.type hadoop
parameter.

Thanks,
Ameet


On Mon, Aug 26, 2013 at 6:09 PM, Ameet Kini <ameetkini@gmail.com> wrote:

>
> I'm trying to use saveAsSequenceFile to output compressed sequenced files
> where the "value" in each key,value pair is compressed. In Hadoop, I would
> set this job configuration parameter:
> "mapred.output.compression.type=RECORD" for record level compression.
> Previous posts have suggested that this is possible by simply setting this
> parameter in the core-site.xml. I tried doing just that, and the sequence
> file doesn't seem to be compressed.
>
> I've also tried doing this by setting
> spark.hadoop.mapred.output.compression.type as a system parameter just
> before initializing the spark context:
> System.setProperty("spark.hadoop.mapred.output.compression.type", "RECORD")
>
> In both cases, I can see that the resulting configuration as per
> SparkContext.hadoopConfiguration has the property set to RECORD, but the
> resulting sequence file still has its value uncompressed.
>
> At first, I thought that this is because io.compression.codecs was set to
> null, so I set io.compression.codecs to the long list of codecs that is its
> normal default value in a Hadoop environment, but still to no avail. Am I
> missing a crucial step?
>
> Thanks,
> Ameet
>

Mime
View raw message