nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2164) Inconsistent 'Modified Time' in crawl db
Date Tue, 23 Aug 2016 08:37:20 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432388#comment-15432388
] 

ASF GitHub Bot commented on NUTCH-2164:
---------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/nutch/pull/108


> Inconsistent 'Modified Time' in crawl db
> ----------------------------------------
>
>                 Key: NUTCH-2164
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2164
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb, fetcher
>    Affects Versions: 1.11
>            Reporter: Thamme Gowda
>            Priority: Minor
>             Fix For: 1.13
>
>
> The 'Modified time' in crawldb is invalid. It is set to (0-Timezone Difference)
> *How to verify/reproduce:*
>   Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 'yy'
> The following improvements can be done:
> 1. Set modified time by DefaultFetchSchedule
> 2. Set ProtocolStatus.lastModified if modified time is available in protocol response
headers
> This issue is also discussed in dev mailing lists: http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message