nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Ganyo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore
Date Mon, 05 Jun 2006 14:00:30 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414762 ] 

Scott Ganyo commented on NUTCH-258:
-----------------------------------

For the record:  I strongly object to closing this issue for the following reasons:

1) Having a *side-effect* of the entire system stop processing after merely logging a message
at a certain event level is a poor practice.  In fact, I believe that this would make a fantastic
anti-pattern.  If this kind of behavior is *really* wanted (and I argue that it should not
be below), it should be done through an explicit mechanism, not as a side-effect.  For example,
did you realize that since Hadoop hijacks and reassigns all log formatters (also a bad practice!)
in the org.apache.hadoop.util.LogFormatter static constructor that anyone using Nutch as a
library and logs a SEVERE error will suffer by having Nutch stop fetching?

2) Moreover, having the system stop processing forever more by use of a static(!) flag makes
the use of the Nutch system as a library within a server or service environment impossible.
 Once this logging is done, no more Fetcher processing in this run *or any other* can take
place.  This is inappropriate.  You might as well call System.exit() at this point!  In fact,
I could even argue that the current behavior is worse than a System.exit(), as it can actually
obfuscate why the system has ceased being operational even though it is still ostensibly "running."

Thus, while there definitely *are* instances of inappropriate logging levels being used and
I could document them, I believe that this issue is more endemic to the system and it's architecture
than the utilization of a particular logging level for a certain event.

> Once Nutch logs a SEVERE log item, Nutch fails forevermore
> ----------------------------------------------------------
>
>          Key: NUTCH-258
>          URL: http://issues.apache.org/jira/browse/NUTCH-258
>      Project: Nutch
>         Type: Bug

>   Components: fetcher
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Scott Ganyo
>     Priority: Critical
>  Attachments: dumbfix.patch
>
> Once a SEVERE log item is written, Nutch shuts down any fetching forevermore.  This is
from the run() method in Fetcher.java:
>     public void run() {
>       synchronized (Fetcher.this) {activeThreads++;} // count threads
>       
>       try {
>         UTF8 key = new UTF8();
>         CrawlDatum datum = new CrawlDatum();
>         
>         while (true) {
>           if (LogFormatter.hasLoggedSevere())     // something bad happened
>             break;                                // exit
>           
> Notice the last 2 lines.  This will prevent Nutch from ever Fetching again once this
is hit as LogFormatter is storing this data as a static.
> (Also note that "LogFormatter.hasLoggedSevere()" is also checked in org.apache.nutch.net.URLFilterChecker
and will disable this class as well.)
> This must be fixed or Nutch cannot be run as any kind of long-running service.  Furthermore,
I believe it is a poor decision to rely on a logging event to determine the state of the application
- this could have any number of side-effects that would be extremely difficult to track down.
 (As it has already for me.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message