nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Date Mon, 05 Jun 2006 16:50:05 GMT
Chris Mattmann wrote:
> Folks,
>  Before I (or someone else) reopens the issue, I think it's important to
> understand the implications:

I vote for re-opening. See below.

>> 1) Having a *side-effect* of the entire system stop processing after merely
>> logging a message at a certain event level is a poor practice.
> I'm not sure that the Fetcher quitting is a * side-effect * as you call it.
> In fact, I think it's clearly stated as the behavior of the system, both
> within the code, and in several mailing list conversations I've seen over
> the course of the past two years (I can dig these up, if needed).

The main problem, as Scott observed, is that the static flag affects all 
instances of the task executing inside the same JVM. If there are 
several Fetcher tasks (or any other tasks that check for SEVERE flag!), 
belonging to different jobs, all of them will quit. This is certainly 
not the intended behavior.

>> In fact, I believe that this would make a fantastic anti-pattern.  If this
>> kind of behavior is *really* wanted (and I argue that it should not be below),
>> it should be done through an explicit mechanism, not as a side-effect.

I have a proposal for a simple solution: set a flag in the current 
Configuration instance, and check for this flag. The Configuration 
instance provides a task-specific context persisting throughout the 
lifetime of a task - but limited only to that task. Voila - problem 
solved. We get rid of the dubious use of LogFormatter (I hope Chris that 
even you would agree that this pattern is slightly .. unusual ;) ), and 
we gain flexible mechanism limited in scope to the current task, which 
ensures isolation from other tasks in the same JVM. How about that?

> I've been using Nutch in a server environment (JSPs and Tomcat) within a
> large-scale data system at NASA for the course of the past year, and have
> never been impeded by the behavior of the fetcher. Can you be more specific

Have you ever tried to run several different crawls inside the same JVM? 
That's a common requirement if you want to use Nutch as a "crawler 
component" inside a larger application. I have, and as a result of my 
bad experiences I initiated the discussion, which led to the "dynamic 
NutchConf" patches implemented by Stefan. The issue of LogFormatter has 
been discussed also about that time, but since we hadn't had "dynamic 
NutchConf" yet it was postponed, because there was no clear idea how to 
solve it cleanly. I believe there is now.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

View raw message