nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-1322) Indexer not to reindex unmodified docs
Date Fri, 30 Mar 2012 20:07:26 GMT
Indexer not to reindex unmodified docs
--------------------------------------

                 Key: NUTCH-1322
                 URL: https://issues.apache.org/jira/browse/NUTCH-1322
             Project: Nutch
          Issue Type: Improvement
          Components: indexer
    Affects Versions: 1.4
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma


IndexerMapReduce already attempts not to index unmodified pages if their fetch status is set
to unmodified. This, however, doesn't always work. Some documents do not have that fetch status
but are actually not modified at all.

The indexer should optionally be able not to reindex these pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message