sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Controlling compression during import
Date Sun, 04 Sep 2011 22:49:51 GMT
Hi there,

The current documentation says:
> By default, data is not compressed. You can compress your data by using the deflate (gzip)
algorithm with the -z or --compress argument, or specify any Hadoop compression codec using
the --compression-codec argument. This applies to both SequenceFiles or text files.
> 
But I think this is a bit misleading.

Currently if output compression is enabled in a cluster, then the Sqooped data is alway compressed,
regardless of the setting of this flag.

It seems better to actually make compression controllable via --compress, which means changing
ImportJobBase.configureOutputFormat()

    if (options.shouldUseCompression()) {
      FileOutputFormat.setCompressOutput(job, true);
      FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
      SequenceFileOutputFormat.setOutputCompressionType(job,
          CompressionType.BLOCK);
    }
   // new stuff
    else {
      FileOutputFormat.setCompressOutput(job, false);
    }

Thoughts?

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Mime
View raw message