nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.
Date Fri, 07 Apr 2017 07:30:41 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960412#comment-15960412
] 

ASF GitHub Bot commented on NUTCH-2046:
---------------------------------------

sebastian-nagel commented on issue #161: Fix for NUTCH-2046 contributed by jnioche
URL: https://github.com/apache/nutch/pull/161#issuecomment-292463192
 
 
   +1 from my side, but we could add a note to CHANGES.txt (as part of a section about API-breaking
changes). Users need to update scripts calling bin/crawl.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> The crawl script should be able to skip an initial injection.
> -------------------------------------------------------------
>
>                 Key: NUTCH-2046
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2046
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb, injector
>    Affects Versions: 1.10
>            Reporter: Luis Lopez
>            Assignee: Lewis John McGibbney
>              Labels: crawl, injection
>             Fix For: 1.14
>
>         Attachments: crawl.patch
>
>
> When our crawl gets really big a new injection takes considerable time as it updates
crawldb, the crawl script should be able to skip the injection and go directly to the generate
call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message