nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xue Yong Zhi (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-633) ParseSegment no longer allow reparsing
Date Mon, 26 May 2008 07:27:55 GMT
ParseSegment no longer allow reparsing
--------------------------------------

                 Key: NUTCH-633
                 URL: https://issues.apache.org/jira/browse/NUTCH-633
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.0.0
         Environment: any
            Reporter: Xue Yong Zhi
            Priority: Minor


ParseSegment used to allow reparsing even if parsing has been enabled in Fetcher. But now
it throws a NumberFormatException as 'content.getMetadata().get(Nutch.FETCH_STATUS_KEY)' is
null.

This patch will fix the problem:

--- a/src/java/org/apache/nutch/parse/ParseSegment.java
+++ b/src/java/org/apache/nutch/parse/ParseSegment.java
@@ -70,8 +70,10 @@ public class ParseSegment extends Configured implements Tool, Mapper<WritableCom
       key = newKey;
     }
     
+    //status_key is only available when parsing is not done in fetcher
+    String status_key = content.getMetadata().get(Nutch.FETCH_STATUS_KEY);
     int status =
-      Integer.parseInt(content.getMetadata().get(Nutch.FETCH_STATUS_KEY));
+      (null == status_key) ? CrawlDatum.STATUS_FETCH_SUCCESS : Integer.parseInt(status_key);
     if (status != CrawlDatum.STATUS_FETCH_SUCCESS) {
       // content not fetched successfully, skip document
       LOG.debug("Skipping " + key + " as content is not fetched successfully");


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message