nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susam Pal <susam....@gmail.com>
Subject Re: crawl-tool.xml mentions nutch-site.xml for overriding but it is not possible
Date Sat, 09 May 2009 08:37:13 GMT
On Tue, Apr 7, 2009 at 1:07 AM, Susam Pal <susam.pal@gmail.com> wrote:
> The inline documentation of 'conf/crawl-tool.xml' mentions:
>
> <!-- Do not modify this file directly.  Instead, copy entries that you -->
> <!-- wish to modify from this file into nutch-site.xml and change them -->
> <!-- there.  If nutch-site.xml does not already exist, create it.      -->
>
> However, I don't see any way of overriding the properties defined in
> 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the
> configuration before 'conf/crawl-tool.xml' in the code. Here are the
> relevant code snippets:
>
> src/org/apache/nutch/crawl/Crawl.java (Lines 57 to 59) :
>
>    Configuration conf = NutchConfiguration.create();
>    conf.addResource("crawl-tool.xml");
>    JobConf job = new NutchJob(conf);
>
> src/org/apache/nutch/tool/NutchConfiguration.java  (Lines 39 to 40) :
>
>    conf.addResource("nutch-default.xml");
>    conf.addResource("nutch-site.xml");
>
> So, shouldn't that XML comment be removed from 'conf/crawl-tool.xml' ?
>
> Regards,
> Susam Pal
>

I have uploaded a patch for this in :
https://issues.apache.org/jira/browse/NUTCH-735

Instead of changing the XML comments, I have changed the code such
that it behaves as per what the XML comments mention.

Regards,
Susam Pal

Mime
View raw message