spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)
Date Tue, 19 Nov 2019 05:11:12 GMT
Hi, All.

First of all, I want to put this as a policy issue instead of a technical
issue.
Also, this is orthogonal from `hadoop` version discussion.

Apache Spark community kept (not maintained) the forked Apache Hive
1.2.1 because there has been no other options before. As we see at
SPARK-20202, it's not a desirable situation among the Apache projects.

    https://issues.apache.org/jira/browse/SPARK-20202

Also, please note that we `kept`, not `maintained`, because we know it's
not good.
There are several attempt to update that forked repository
for several reasons (Hadoop 3 support is one of the example),
but those attempts are also turned down.

>From Apache Spark 3.0, it seems that we have a new feasible option
`hive-2.3` profile. What about moving forward in this direction further?

For example, can we remove the usage of forked `hive` in Apache Spark 3.0
completely officially? If someone still needs to use the forked `hive`, we
can
have a profile `hive-1.2`. Of course, it should not be a default profile in
the community.

I want to say this is a goal we should achieve someday.
If we don't do anything, nothing happen. At least we need to prepare this.
Without any preparation, Spark 3.1+ will be the same.

Shall we focus on what are our problems with Hive 2.3.6?
If the only reason is that we didn't use it before, we can release another
`3.0.0-preview` for that.

Bests,
Dongjoon.

Mime
View raw message