nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: [jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.0
Date Mon, 30 Jun 2008 20:46:22 GMT
Lincoln Ritter wrote:
> Just to clarify: Andrzej, the resolution you speak of in 0.19 - is
> that resolution independent of Michael's patch?

Yes, this is something that will be submitted in a separate Hadoop JIRA 
issue.

> 
> I think any solution with less code is preferable, so a configuration
> change seems like a great way to go.  (I didn't realize one could
> change hadoop parameters from the nutch config!)

Nutch configuration files are loaded later than Hadoop config files, and 
any properties defined there, which are not already declared "final" in 
Hadoop, can be overridden. Usually you don't notice this, because Nutch 
uses property names that don't collide with Hadoop property names. Also, 
this mechanism was a bit different in older versions of Hadoop, where 
whole resources were declared "final" instead of individual properties.

> That being said, well
> defined Hadoop behavior shouldn't break Nutch,

But that's the problem - this Hadoop feature is ill-defined, and it even 
breaks internal Hadoop classes such as MapFileOutputFormat.getReaders().

>  so exposing a public
> interface for "special" files (like hidden files) I think is a good
> idea.  Nutch mysteriously breaking because it can't determine its
> input properly seems much more confusing (to a user anyway) than an
> additional few lines of code.

Well, generally speaking I agree - but in this particular case it's a 
Hadoop mis-feature that needs to be avoided for the time being. We can't 
fix this bug in Hadoop 0.17 or 0.18, only in 0.19 (and then perhaps it 
can be backported to 0.17.1 or 0.18.1).


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message