nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Joyce (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1911) Imeprove DomainStatistics tool command line parsing
Date Thu, 16 Apr 2015 20:37:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498689#comment-14498689
] 

Michael Joyce commented on NUTCH-1911:
--------------------------------------

Hey folks,

Here's what the output from this looks like

{code}
Usage: DomainStatistics inputDirs outDir mode [numOfReducer]
	inputDirs	Comma separated list of crawldb input directories
			E.g.: crawl/crawldb/current/
	outDir		Output directory where results should be dumped
	mode		Set statistics gathering mode
				host	Gather statistics by host
				domain	Gather statistics by domain
				suffix	Gather statistics by suffix
				tld	Gather statistics by top level directory
	[numOfReducers]	Optional number of reduce jobs to use. Defaults to 1.
{code}

> Imeprove DomainStatistics tool command line parsing
> ---------------------------------------------------
>
>                 Key: NUTCH-1911
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1911
>             Project: Nutch
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 1.9, 2.2.1
>            Reporter: Lewis John McGibbney
>            Priority: Trivial
>             Fix For: 1.11
>
>
> The DomainStatistic's tool could be improved based on the comments addressed in [this
mai thread|http://www.mail-archive.com/user%40nutch.apache.org/msg13028.html]
> For convenience, I've also pasted them below
> {quote}
> You cannot just tell it where the crawldb is, you need to tell it where the directory
is, so specifying current is ok, but not part-*
> {quote}
> Patch should be trivial work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message