nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Date Thu, 15 Jun 2006 19:09:30 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12416379 ] 

Chris A. Mattmann commented on NUTCH-258:
-----------------------------------------

> Thanks for this patch Chris - even if now it is outdate by NUTCH-303 :-(
> Since Nutch no more use the deprecated Hadoop LogFormatter, there is no more logSevere
check in the code.

Oh Jerome. You're always trying to scoop me on stuff! ;)


> But I'm not sure all these log severe should be marked as severe (fatal level is used
now).

Agreed. Let's review the places in the patch where severe errors are logged, and then remove/add
as deemed necessary. 


> So, what I suggest is to review all the fatal logs and check if they are really fatal
for the whole process. 

Agreed. I'll get on this right away.

> And finally, why not simply throwing a RuntimeException that will by catched the Fetcher
if something wrong really occurs?

Because we don't want one RuntimeException killing all subsequent fetching tasks. See the
previous discussions on this by Andrzej, Scott, and I. Basically it boils down to ensuring
that LOG.severe and its associated checking mechanism is associated within the context of
a particular fetching task that executes: we believed that the best way to do that would be
to use the Hadoop Configuration (which is task specific). Make sense?

Okey dokey, I'll work on an updated patch and submit for review soon (I won't specify an exact
date, because I'm always late ;) ).


> Once Nutch logs a SEVERE log item, Nutch fails forevermore
> ----------------------------------------------------------
>
>          Key: NUTCH-258
>          URL: http://issues.apache.org/jira/browse/NUTCH-258
>      Project: Nutch
>         Type: Bug

>   Components: fetcher
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Scott Ganyo
>     Assignee: Chris A. Mattmann
>     Priority: Critical
>  Attachments: NUTCH-258.Mattmann.060906.patch.txt, dumbfix.patch
>
> Once a SEVERE log item is written, Nutch shuts down any fetching forevermore.  This is
from the run() method in Fetcher.java:
>     public void run() {
>       synchronized (Fetcher.this) {activeThreads++;} // count threads
>       
>       try {
>         UTF8 key = new UTF8();
>         CrawlDatum datum = new CrawlDatum();
>         
>         while (true) {
>           if (LogFormatter.hasLoggedSevere())     // something bad happened
>             break;                                // exit
>           
> Notice the last 2 lines.  This will prevent Nutch from ever Fetching again once this
is hit as LogFormatter is storing this data as a static.
> (Also note that "LogFormatter.hasLoggedSevere()" is also checked in org.apache.nutch.net.URLFilterChecker
and will disable this class as well.)
> This must be fixed or Nutch cannot be run as any kind of long-running service.  Furthermore,
I believe it is a poor decision to rely on a logging event to determine the state of the application
- this could have any number of side-effects that would be extremely difficult to track down.
 (As it has already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message