nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hanchi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-1784) CLONE - modifiedTime and prevmodifiedTime never set
Date Sat, 17 May 2014 11:53:14 GMT
hanchi created NUTCH-1784:
-----------------------------

             Summary: CLONE - modifiedTime and prevmodifiedTime never set 
                 Key: NUTCH-1784
                 URL: https://issues.apache.org/jira/browse/NUTCH-1784
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 2.2.1
            Reporter: hanchi
             Fix For: 2.3
         Attachments: NUTCH-1651.patch

modifiedTime is never set. If you use DefaultFetchScheduler, modifiedTime is always zero as
default. But if you use AdaptiveFetchScheduler, modifiedTime is set only once in the beginning
by zero-control of AdaptiveFetchScheduler.
But this is not sufficient since modifiedTime needs to be updated whenever last modified time
is available. We corrected this with a patch.

Also we noticed that prevModifiedTime is not written to database and we corrected that too.

With this patch, whenever lastModifiedTime is available, we do two things. First we set modifiedTime
in the Page object to prevModifiedTime. After that we set lastModifiedTime to modifiedTime.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message