nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2727) Upgrade Hadoop dependencies to 2.9.2
Date Tue, 06 Aug 2019 12:15:00 GMT


Sebastian Nagel commented on NUTCH-2727:

Hi [~markus17], yes, I would also, if we can guarantee a certain level of backward-compatibility.
An upgrade to Hadoop 3.x may force some API changes or dependency upgrades which then makes
it impossible to run the Nutch job file on a Hadoop 2.x cluster. The Hadoop version is often
not easy to change because the cluster is shared with legacy applications and/or the cluster
deployment is fixed and bound to a Hadoop distribution (Cloudera, Hortonworks, MapR, EMR,
Azure, etc.). I want to avoid that users have to downgrade to get Nutch run on their cluster.
I can confirm that the opposite (running Nutch built with 2.7.4 on Hadoop 3.x) works out-of-the-box.
Do you build Nutch with Hadoop 3.2.0 or just similarly run the Nutch job file on a Hadoop
3.2.0 cluster?

> Upgrade Hadoop dependencies to 2.9.2
> ------------------------------------
>                 Key: NUTCH-2727
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.16
> The latest upgrade of the Hadoop dependency dates back to Dec 2017 (NUTCH-2354). We might
upgrade to the latest version of Hadoop 2.x (2.9.2).
> Note: Nutch 1.15 (or master) built with Hadoop 2.7.4 runs seamlessly on Hadoop 3.x. This
should be also the case for 2.9.4 (to be tested), so we still might wait for the final upgrade
to Hadoop 3.x to ensure backward-compatibility.

This message was sent by Atlassian JIRA

View raw message