mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Coding gotcha in WikipediaToSequenceFile.java
Date Thu, 10 Mar 2011 09:17:17 GMT
In WikipediaToSequenceFile there is a coding mistake: new Job(conf)
copies the conf structure and all future changes are not reflected in
the job configuration. If you ever uncomment the second block of code,
it will not take effect.

conf.set( stuff)
conf.set(more stuff)
Job job = new Job(conf)

......

    /*
     * conf.set("mapred.compress.map.output", "true");
conf.set("mapred.map.output.compression.type",
     * "BLOCK"); conf.set("mapred.output.compress", "true");
conf.set("mapred.output.compression.type",
     * "BLOCK"); conf.set("mapred.output.compression.codec",
"org.apache.hadoop.io.compress.GzipCodec");
     */


-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message