nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Asitang Mishra (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.4 stable
Date Tue, 18 Aug 2015 16:08:45 GMT


Asitang Mishra commented on NUTCH-2049:

Hi Chris,

The Naive Bayes plugin, since has a hadoop job of it's own. does only work in local mode and
not distributed. Because, the Parse job of which this plugin is a part, is also a hadoop job.
So, it becomes a nested hadoop job. 

Since, the training part of the plugin is the only one that is a hadoop job (and not the classification).
I can make a separate tool for training. And keep only the classification part in the plugin,
which is not a hadoop job (And have tested this in distributed mode).


> Upgrade Trunk to Hadoop > 2.4 stable
> ------------------------------------
>                 Key: NUTCH-2049
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>              Labels: memex
>             Fix For: 1.11
>         Attachments: NUTCH-2049.patch, NUTCH-2049v2.patch
> Convo here -
> I am +1 for taking trunk (or a branch of trunk) to explicit dependency on > Hadoop
> We can run our tests, we can validate, we can fix.
> I will be doing validation on 2.X in paralegal as this is what I use on my own projects.

This message was sent by Atlassian JIRA

View raw message