spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)
Date Wed, 20 Nov 2019 07:40:06 GMT
Cheng, could you elaborate on your criteria, `Hive 2.3 code paths are
proven to be stable`?
For me, it's difficult to image that we can reach any stable situation when
we don't use it at all by ourselves.

    > The Hive 1.2 code paths can only be removed once the Hive 2.3 code
paths are proven to be stable.

Sean, our published POM is pointing and advertising the illegitimate Hive
1.2 fork as a compile dependency.
Yes. It can be overridden. So, why does Apache Spark need to publish like
that?
If someone want to use that illegitimate Hive 1.2 fork, let them override
it. We are unable to delete those illegitimate Hive 1.2 fork.
Those artifacts will be orphans.

    > The published POM will be agnostic to Hadoop / Hive; well,
    > it will link against a particular version but can be overridden.

    -
https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.12/3.0.0-preview
       ->
https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark2
       ->
https://mvnrepository.com/artifact/org.spark-project.hive/hive-metastore/1.2.1.spark2

Bests,
Dongjoon.


On Tue, Nov 19, 2019 at 5:26 PM Hyukjin Kwon <gurwls223@gmail.com> wrote:

> > Should Hadoop 2 + Hive 2 be considered to work on JDK 11?
> This seems being investigated by Yuming's PR (
> https://github.com/apache/spark/pull/26533) if I am not mistaken.
>
> Oh, yes, what I meant by (default) was the default profiles we will use in
> Spark.
>
>
> 2019년 11월 20일 (수) 오전 10:14, Sean Owen <srowen@gmail.com>님이 작성:
>
>> Should Hadoop 2 + Hive 2 be considered to work on JDK 11? I wasn't
>> sure if 2.7 did, but honestly I've lost track.
>> Anyway, it doesn't matter much as the JDK doesn't cause another build
>> permutation. All are built targeting Java 8.
>>
>> I also don't know if we have to declare a binary release a default.
>> The published POM will be agnostic to Hadoop / Hive; well, it will
>> link against a particular version but can be overridden. That's what
>> you're getting at?
>>
>>
>> On Tue, Nov 19, 2019 at 7:11 PM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>> >
>> > So, are we able to conclude our plans as below?
>> >
>> > 1. In Spark 3,  we release as below:
>> >   - Hadoop 3.2 + Hive 2.3 + JDK8 build that also works JDK 11
>> >   - Hadoop 2.7 + Hive 2.3 + JDK8 build that also works JDK 11
>> >   - Hadoop 2.7 + Hive 1.2.1 (fork) + JDK8 (default)
>> >
>> > 2. In Spark 3.1, we target:
>> >   - Hadoop 3.2 + Hive 2.3 + JDK8 build that also works JDK 11
>> >   - Hadoop 2.7 + Hive 2.3 + JDK8 build that also works JDK 11 (default)
>> >
>> > 3. Avoid to remove "Hadoop 2.7 + Hive 1.2.1 (fork) + JDK8 (default)"
>> combo right away after cutting branch-3 to see if Hive 2.3 is considered as
>> stable in general.
>> >     I roughly suspect it would be a couple of months after Spark 3.0
>> release (?).
>> >
>> > BTW, maybe we should officially note that "Hadoop 2.7 + Hive 1.2.1
>> (fork) + JDK8 (default)" combination is deprecated anyway in Spark 3.
>> >
>>
>

Mime
View raw message