nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2093) Indexing filters have no signature in CrawlDatum if crawled via FreeGenerator
Date Fri, 11 Sep 2015 12:14:45 GMT
Markus Jelsma created NUTCH-2093:
------------------------------------

             Summary: Indexing filters have no signature in CrawlDatum if crawled via FreeGenerator
                 Key: NUTCH-2093
                 URL: https://issues.apache.org/jira/browse/NUTCH-2093
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 1.10
            Reporter: Markus Jelsma
            Priority: Minor
             Fix For: 1.11
         Attachments: NUTCH-2093.patch

In IndexerMapReduce, a fetchDatum is passed to the indexing filters. However, when this fetchDatum
was created via FreeGenerator, it has no signature attached, and indexing filters don't see
it.

This patch copies the signature from the dbDatum just before passed to indexing filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message