nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (NUTCH-1322) Indexer not to reindex unmodified docs
Date Mon, 23 Apr 2012 08:53:44 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Markus Jelsma closed NUTCH-1322.
--------------------------------

    Resolution: Duplicate
    
> Indexer not to reindex unmodified docs
> --------------------------------------
>
>                 Key: NUTCH-1322
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1322
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>
> IndexerMapReduce already attempts not to index unmodified pages if their fetch status
is set to unmodified. This, however, doesn't always work. Some documents do not have that
fetch status but are actually not modified at all.
> The indexer should optionally be able not to reindex these pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message