nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2409) Injector: complete command-line help and counters
Date Thu, 17 Aug 2017 10:55:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130245#comment-16130245
] 

ASF GitHub Bot commented on NUTCH-2409:
---------------------------------------

sebastian-nagel opened a new pull request #215: NUTCH-2409 Injector: complete command-line
help and counters
URL: https://github.com/apache/nutch/pull/215
 
 
   - add counters for items removed from CrawlDb
   - add -Ddb.update.purge.404=true to command-line help
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Injector: complete command-line help and counters
> -------------------------------------------------
>
>                 Key: NUTCH-2409
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2409
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: 1.13
>            Reporter: Sebastian Nagel
>            Priority: Trivial
>             Fix For: 1.14
>
>
> See discussion in [NUTCH-2335|https://issues.apache.org/jira/browse/NUTCH-2335?focusedCommentId=16130178&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16130178]:
> - add counters for removed items from CrawlDb:
> {noformat}
> Injector: Total urls removed from CrawlDb by filters: 2
> Injector: Total urls with status gone removed from CrawlDb (db.update.purge.404): 0
> {noformat}
> - add {{-Ddb.update.purge.404=true}} to command-line help:
> {noformat}
> Usage: Injector [-D...] <crawldb> <url_dir> [-overwrite|-update] [-noFilter]
[-noNormalize] [-filterNormalizeAll]
> ...
>  -D...          set or overwrite configuration property (property=value)
>  -Ddb.update.purge.404=true
>                 remove URLs with status gone (404) from CrawlDb
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message