nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2097) Proposal for Nutch 3.x
Date Tue, 15 Sep 2015 07:18:45 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744983#comment-14744983
] 

Lewis John McGibbney commented on NUTCH-2097:
---------------------------------------------

Hi [~markus17] thanks for initial comments. I knew this one would kick off some convo ;)
bq. What does 'Complete Ant + Ivy build system overhaul.' actually mean?
I just clarified above... "Complete Ant + Ivy build system overhaul. e.g. replaced with Apache
Maven (Non-back compatible)"
bq. Also, i actually don't see the point in having separate mapper and reducer packages
AFAIK is that the logic in Nutch 1.11-SNAPSHOT for example is currently spread across many
different packages meaning that it is difficult to locate the custom writables, mappers, reducers,
partitions, etc. Yes, maybe the proposal to group them together into new packages is maybe
a stark change but IMHO it makes it much clearer for communicating and interpreting the codebase
to if they are grouped this way.
If you consider on the other hand, does the [crawl|https://github.com/apache/nutch/tree/trunk/src/java/org/apache/nutch/crawl]
package contents (under current 1.11-SNAPSHOT) even make that much sense? We have signature
classes, custom writables, core data structures, individual tools, and fetch schedule's all
packaged into one parent called crawl! To me this has seemed a clutter and in need of better
organization for a while now as well. 
I think the proposal on the table here begins to make better sense if the above point is taken
into consideration? Do you think the same or does a package refactoring still seem pointless?
bq. I do think that using Maven is a very good thing, the is true for using mapreduce API's.
I agree Markus. This should clean up a lot of code and also should lower the barrier to adding
new plugins and for dependency management. It will also resolve the "do we put dependency
within ivy/ivy.xml Vs plugin/${plugin.name}/ivy.xml" issue with comes up from time-to-time.
Thanks for the response Markus, it's giving me food for thought :)

In all honesty

> Proposal for Nutch 3.x
> ----------------------
>
>                 Key: NUTCH-2097
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2097
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.12
>            Reporter: Nadeem Douba
>            Assignee: Lewis John McGibbney
>
> This is a parent issue which contains a proposal for Nutch 3.x. It's based on my branch
(mr2-mvn at https://github.com/allfro/nutch).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message