spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <>
Subject Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?
Date Tue, 19 Nov 2019 04:19:45 GMT
1000% with Steve, the org.spark-project hive 1.2 will need a solution. It is old and rather
buggy; and It’s been *years*

I think we should decouple hive change from everything else if people are concerned?

From: Steve Loughran <>
Sent: Sunday, November 17, 2019 9:22:09 AM
To: Cheng Lian <>
Cc: Sean Owen <>; Wenchen Fan <>; Dongjoon
Hyun <>; dev <>; Yuming Wang <>
Subject: Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Can I take this moment to remind everyone that the version of hive which spark has historically
bundled (the org.spark-project one) is an orphan project put together to deal with Hive's
shading issues and a source of unhappiness in the Hive project. What ever get shipped should
do its best to avoid including that file.

Postponing a switch to hadoop 3.x after spark 3.0 is probably the safest move from a risk
minimisation perspective. If something has broken then it is you can start with the assumption
that it is in the o.a.s packages without having to debug o.a.hadoop and o.a.hive first. There
is a cost: if there are problems with the hadoop / hive dependencies those teams will inevitably
ignore filed bug reports for the same reason spark team will probably because 1.6-related
JIRAs as WONTFIX. WONTFIX responses for the Hadoop 2.x line include any compatibility issues
with Java 9+. Do bear that in mind. It's not been tested, it has dependencies on artifacts
we know are incompatible, and as far as the Hadoop project is concerned: people should move
to branch 3 if they want to run on a modern version of Java

It would be really really good if the published spark maven artefacts (a) included the spark-hadoop-cloud
JAR and (b) were dependent upon hadoop 3.x. That way people doing things with their own projects
will get up-to-date dependencies and don't get WONTFIX responses themselves.


PS: Discussion on hadoop-dev @ making Hadoop 2.10 the official "last ever" branch-2 release
and then declare its predecessors EOL; 2.10 will be the transition release.

On Sun, Nov 17, 2019 at 1:50 AM Cheng Lian <<>>
Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I thought the original
proposal was to replace Hive 1.2 with Hive 2.3, which seemed risky, and therefore we only
introduced Hive 2.3 under the hadoop-3.2 profile without removing Hive 1.2. But maybe I'm
totally wrong here...

Sean, Yuming's PR showed that Hadoop 2 + Hive 2
+ JDK 11 looks promising. My major motivation is not about demand, but risk control: coupling
Hive 2.3, Hadoop 3.2, and JDK 11 upgrade together looks too risky.

On Sat, Nov 16, 2019 at 4:03 AM Sean Owen <<>>
I'd prefer simply not making Hadoop 3 the default until 3.1+, rather
than introduce yet another build combination. Does Hadoop 2 + Hive 2
work and is there demand for it?

On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan <<>>
> Do we have a limitation on the number of pre-built distributions? Seems this time we
> 1. hadoop 2.7 + hive 1.2
> 2. hadoop 2.7 + hive 2.3
> 3. hadoop 3 + hive 2.3
> AFAIK we always built with JDK 8 (but make it JDK 11 compatible), so don't need to add
JDK version to the combination.
> On Sat, Nov 16, 2019 at 4:05 PM Dongjoon Hyun <<>>
>> Thank you for suggestion.
>> Having `hive-2.3` profile sounds good to me because it's orthogonal to Hadoop 3.
>> IIRC, originally, it was proposed in that way, but we put it under `hadoop-3.2` to
avoid adding new profiles at that time.
>> And, I'm wondering if you are considering additional pre-built distribution and Jenkins
>> Bests,
>> Dongjoon.

View raw message