spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@cloudera.com.INVALID>
Subject Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?
Date Fri, 01 Nov 2019 12:32:28 GMT
What is the current default value? as the 2.x releases are becoming EOL;
2.7 is dead, there might be a 2.8.x; for now 2.9 is the branch-2 release
getting attention. 2.10.0 shipped yesterday, but the ".0" means there will
inevitably be surprises.

One issue about using a older versions is that any problem reported
-especially at stack traces you can blame me for- Will generally be met by
a response of "does it go away when you upgrade?" The other issue is how
much test coverage are things getting?

w.r.t Hadoop 3.2 stability, nothing major has been reported. The ABFS
client is there, and I the big guava update (HADOOP-16213) went in. People
will either love or hate that.

No major changes in s3a code between 3.2.0 and 3.2.1; I have a large
backport planned though, including changes to better handle AWS caching of
404s generatd from HEAD requests before an object was actually created.

It would be really good if the spark distributions shipped with later
versions of the hadoop artifacts.

On Mon, Oct 28, 2019 at 7:53 PM Xiao Li <lixiao@databricks.com> wrote:

> The stability and quality of Hadoop 3.2 profile are unknown. The changes
> are massive, including Hive execution and a new version of Hive
> thriftserver.
>
> To reduce the risk, I would like to keep the current default version
> unchanged. When it becomes stable, we can change the default profile to
> Hadoop-3.2.
>
> Cheers,
>
> Xiao
>
> On Mon, Oct 28, 2019 at 12:51 PM Sean Owen <srowen@gmail.com> wrote:
>
>> I'm OK with that, but don't have a strong opinion nor info about the
>> implications.
>> That said my guess is we're close to the point where we don't need to
>> support Hadoop 2.x anyway, so, yeah.
>>
>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
>> wrote:
>> >
>> > Hi, All.
>> >
>> > There was a discussion on publishing artifacts built with Hadoop 3 .
>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will
>> be the same because we didn't change anything yet.
>> >
>> > Technically, we need to change two places for publishing.
>> >
>> > 1. Jenkins Snapshot Publishing
>> >
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
>> >
>> > 2. Release Snapshot/Release Publishing
>> >
>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
>> >
>> > To minimize the change, we need to switch our default Hadoop profile.
>> >
>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2
>> (3.2.0)` is optional.
>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
>> optionally.
>> >
>> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7`
>> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.
>> >
>> > Bests,
>> > Dongjoon.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>
> --
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>

Mime
View raw message