nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (Updated) (JIRA)" <>
Subject [jira] [Updated] (NUTCH-1341) NotModified time set to now but page not modified
Date Thu, 19 Apr 2012 14:54:45 GMT


Markus Jelsma updated NUTCH-1341:

    Attachment: NUTCH-1341-1.6-1.patch

Here's a patch for 1.6. It simply resets the modifiedTime to the CrawlDatum's previous value
right after the reducers sets a STATUS_DB_NOTMODIFIED status value. Since i believe the status
is correct i assume the modifiedTime value can be reset here as well.

Please comment. Did i overlook something? Implement it differently?

> NotModified time set to now but page not modified
> -------------------------------------------------
>                 Key: NUTCH-1341
>                 URL:
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.6
>         Attachments: NUTCH-1341-1.6-1.patch
> Servers tend to respond with incorrect or no value for LastModified. By comparing signatures
or when (fetch.getStatus() == CrawlDatum.STATUS_FETCH_NOTMODIFIED) the reducer correctly sets
the db_notmodified status for the CrawlDatum. The modifiedTime value, however, is not set

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message