nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2242) lastModified not always set
Date Sat, 04 Nov 2017 17:17:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239112#comment-16239112
] 

ASF GitHub Bot commented on NUTCH-2242:
---------------------------------------

Omkar20895 commented on issue #238: NUTCH-2242 Injector to stop if job fails to avoid loss
of CrawlDb
URL: https://github.com/apache/nutch/pull/238#issuecomment-341913968
 
 
   Closing the PR as there was a typo in the commit and it has been assigned to NUTCH-2242
rather than NUTCH-2442. Apologies. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> lastModified not always set
> ---------------------------
>
>                 Key: NUTCH-2242
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2242
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.11
>            Reporter: Jurian Broertjes
>            Priority: Minor
>             Fix For: 1.13
>
>         Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not updated
on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the modifiedTime
isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message