spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@cloudera.com.INVALID>
Subject Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?
Date Fri, 22 Nov 2019 13:00:49 GMT
On Tue, Nov 19, 2019 at 10:40 PM Cheng Lian <lian.cs.zju@gmail.com> wrote:

> Hey Steve,
>
> In terms of Maven artifact, I don't think the default Hadoop version
> matters except for the spark-hadoop-cloud module, which is only meaningful
> under the hadoop-3.2 profile. All  the other spark-* artifacts published to
> Maven central are Hadoop-version-neutral.
>

It's more that everyone using it has to do the game of excluding all the
old artifacts and requesting the new dependencies -including working out
what the spark poms excluded from their imports of later versions of
things.


>
> Another issue about switching the default Hadoop version to 3.2 is PySpark
> distribution. Right now, we only publish PySpark artifacts prebuilt with
> Hadoop 2.x to PyPI. I'm not sure whether bumping the Hadoop dependency to
> 3.2 is feasible for PySpark users. Or maybe we should publish PySpark
> prebuilt with both Hadoop 2.x and 3.x. I'm open to suggestions on this one.
>
> Again, as long as Hive 2.3 and Hadoop 3.2 upgrade can be decoupled via the
> proposed hive-2.3 profile, I personally don't have a preference over having
> Hadoop 2.7 or 3.2 as the default Hadoop version. But just for minimizing
> the release management work, in case we decided to publish other spark-*
> Maven artifacts from a Hadoop 2.7 build, we can still special case
> spark-hadoop-cloud and publish it using a hadoop-3.2 build.
>

that would really complicate life on maven. sticking a version on mvn
central with the 3.2 dependencies consistently would be better



>>>

Mime
View raw message