nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.
Date Tue, 18 Apr 2017 18:03:41 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973183#comment-15973183
] 

ASF GitHub Bot commented on NUTCH-2046:
---------------------------------------

jnioche closed pull request #161: Fix for NUTCH-2046 contributed by jnioche
URL: https://github.com/apache/nutch/pull/161
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> The crawl script should be able to skip an initial injection.
> -------------------------------------------------------------
>
>                 Key: NUTCH-2046
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2046
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb, injector
>    Affects Versions: 1.10
>            Reporter: Luis Lopez
>            Assignee: Lewis John McGibbney
>              Labels: crawl, injection
>             Fix For: 1.14
>
>         Attachments: crawl.patch
>
>
> When our crawl gets really big a new injection takes considerable time as it updates
crawldb, the crawl script should be able to skip the injection and go directly to the generate
call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message