nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer
Date Wed, 16 Apr 2014 14:32:21 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971471#comment-13971471
] 

Julien Nioche commented on NUTCH-1147:
--------------------------------------

Good idea not to force it to 1 but what about relying on the standard 'mapred.reduce.tasks'
param which can either be set globally or specified on the command line (see crawl script)
instead of implementing a special param for it?

> WebGraph nodeDumper uses only 1 reducer
> ---------------------------------------
>
>                 Key: NUTCH-1147
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1147
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Trivial
>             Fix For: 1.9
>
>         Attachments: NUTCH-1147-1.5-1.patch
>
>
> The noderDumper is restricted to only one reducer, making it slow and producing too large
files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message