nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.
Date Fri, 07 Apr 2017 07:30:41 GMT


ASF GitHub Bot commented on NUTCH-2046:

sebastian-nagel commented on issue #161: Fix for NUTCH-2046 contributed by jnioche
   +1 from my side, but we could add a note to CHANGES.txt (as part of a section about API-breaking
changes). Users need to update scripts calling bin/crawl.
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> The crawl script should be able to skip an initial injection.
> -------------------------------------------------------------
>                 Key: NUTCH-2046
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb, injector
>    Affects Versions: 1.10
>            Reporter: Luis Lopez
>            Assignee: Lewis John McGibbney
>              Labels: crawl, injection
>             Fix For: 1.14
>         Attachments: crawl.patch
> When our crawl gets really big a new injection takes considerable time as it updates
crawldb, the crawl script should be able to skip the injection and go directly to the generate

This message was sent by Atlassian JIRA

View raw message