spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <>
Subject Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)
Date Tue, 19 Nov 2019 07:32:45 GMT
I struggled hard to deal with this issue multiple times over a year and
thankfully we finally
decided to use the official version of Hive 2.3.x too (thank you, Yuming,
Alan, and guys)
I think this is already a huge progress that we started to use the
official version of Hive.

I think we should at least have one minor release term to let users test
out Spark with Hive 2.3.x. before switching this
as a default. My impression was it's the decision made before at:

How about we try to make it default in Spark 3.1 by using this thread as a
reference? I think it's too a radical change.

2019년 11월 19일 (화) 오후 2:11, Dongjoon Hyun <>님이

> Hi, All.
> First of all, I want to put this as a policy issue instead of a technical
> issue.
> Also, this is orthogonal from `hadoop` version discussion.
> Apache Spark community kept (not maintained) the forked Apache Hive
> 1.2.1 because there has been no other options before. As we see at
> SPARK-20202, it's not a desirable situation among the Apache projects.
> Also, please note that we `kept`, not `maintained`, because we know it's
> not good.
> There are several attempt to update that forked repository
> for several reasons (Hadoop 3 support is one of the example),
> but those attempts are also turned down.
> From Apache Spark 3.0, it seems that we have a new feasible option
> `hive-2.3` profile. What about moving forward in this direction further?
> For example, can we remove the usage of forked `hive` in Apache Spark 3.0
> completely officially? If someone still needs to use the forked `hive`, we
> can
> have a profile `hive-1.2`. Of course, it should not be a default profile
> in the community.
> I want to say this is a goal we should achieve someday.
> If we don't do anything, nothing happen. At least we need to prepare this.
> Without any preparation, Spark 3.1+ will be the same.
> Shall we focus on what are our problems with Hive 2.3.6?
> If the only reason is that we didn't use it before, we can release
> another
> `3.0.0-preview` for that.
> Bests,
> Dongjoon.

View raw message