nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: setting number of reduce outputs problem
Date Sat, 12 Jan 2008 13:15:59 GMT
viz wrote:
> Hi. 
> In our hadoop cluster I use a configuration (set in hadoop-site.xml) to have
> mapred.reduce.tasks=2 by default.
> However, I have few jobs were I need exactly one output from reduce (i.e.
> just part-00000). I thought its staightforward:
> 
> JobConf job = new NutchJob(getConf());
> job.setNumReduceTasks(1);
> ...
> 
> But it seem any settings done this way are just ignored. Is that ok? Even
> official examples say it should work. Could it be we misconfigured something
> else? 
> Or is there any other way to get one data file as output? 

You should have put this property in mapred-default.xml. In this version 
of Hadoop, settings specified in hadoop-site.xml ALWAYS override any 
other settings, including per-job settings, even those specified through 
API.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message