nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <>
Subject [jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb
Date Mon, 25 Jan 2016 22:29:39 GMT


Lewis John McGibbney updated NUTCH-2184:
    Attachment: NUTCH-2184v2.patch

Updated patch for trunk. [~markus17], working to address your comments now thanks for response,
i must have missed them.

> Enable IndexingJob to function with no crawldb
> ----------------------------------------------
>                 Key: NUTCH-2184
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.12
>         Attachments: NUTCH-2184.patch, NUTCH-2184v2.patch
> Sometimes when working with distributed team(s), we have found that we can 'loose' data
structures which are currently considered as critical e.g. crawldb, linkdb and/or segments.
> In my current scenario I have a requirement to index segment data with no accompanying
crawldb or linkdb. 
> Absence of the latter is OK as linkdb is optional however currently in [IndexerMapReduce|]
crawldb is mandatory. 
> This ticket should enhance the IndexerMapReduce code to support the use case where you
ONLY have segments and want to force an index for every record present.

This message was sent by Atlassian JIRA

View raw message