nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Joyce (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1988) Make nested output directory dump optional
Date Wed, 15 Apr 2015 19:25:58 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496755#comment-14496755
] 

Michael Joyce commented on NUTCH-1988:
--------------------------------------

Hi folks. Here's an example output run of this.

{code}
[mjjoyce@machine local]$ bin/nutch dump -outputDir ./foodir -segment ../local_elasticsearch_testt/crawl/segments/
[mjjoyce@machine local]$ bin/nutch dump -flatdir -outputDir ./foodir2 -segment ../local_elasticsearch_testt/crawl/segments/
[mjjoyce@machine local]$ ls -R foodir
foodir:
8f  f8

foodir/8f:
a7

foodir/8f/a7:
8d84f847f7310620a9edc4327bbfc133_.html

foodir/f8:
df

foodir/f8/df:
fec7849283af7a0adc77eddefb242b6e_.html
[mjjoyce@machine local]$ ls -R foodir2
foodir2:
8d84f847f7310620a9edc4327bbfc133_.html  fec7849283af7a0adc77eddefb242b6e_.html
[mjjoyce@machine local]$ 
{code}

> Make nested output directory dump optional
> ------------------------------------------
>
>                 Key: NUTCH-1988
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1988
>             Project: Nutch
>          Issue Type: Improvement
>          Components: dumpers
>    Affects Versions: 1.9
>            Reporter: Michael Joyce
>            Priority: Minor
>             Fix For: 1.10
>
>
> NUTCH-1957 added nested directories to the bin/nutch dump output to help avoid naming
conflicts in output files. It would be nice to be able to specify that you want the older
flat directory output as an optional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message