spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Re: Drop support for old Hive in Spark 3.0?
Date Fri, 26 Oct 2018 16:51:42 GMT
Hi, Sean and All.

For the first question, we support only Hive Metastore from 1.x ~ 2.x. And,
we can support Hive Metastore 3.0 simultaneously. Spark is designed like
that.

I don't think we need to drop old Hive Metastore Support. Is it
for avoiding Hive Metastore sharing between Spark2 and Spark3 clusters?

I think we should allow that use cases, especially for new Spark 3
clusters. How do you think so?


For the second question, Apache Spark 2.x doesn't support Hive officially.
It's only a best-effort approach in a boundary of Spark.

http://spark.apache.org/docs/latest/sql-programming-guide.html#unsupported-hive-functionality
http://spark.apache.org/docs/latest/sql-programming-guide.html#incompatible-hive-udf


Not only the documented one, decimal literal(HIVE-17186) makes a query
result difference even in the well-known benchmark like TPC-H.

Bests,
Dongjoon.

PS. For Hadoop, let's have another thread if needed. I expect another long
story. :)


On Fri, Oct 26, 2018 at 7:11 AM Sean Owen <srowen@gmail.com> wrote:

> Here's another thread to start considering, and I know it's been raised
> before.
> What version(s) of Hive should Spark 3 support?
>
> If at least we know it won't include Hive 0.x, could we go ahead and
> remove those tests from master? It might significantly reduce the run time
> and flakiness.
>
> It seems that maintaining even the Hive 1.x fork is untenable going
> forward, right? does that also imply this support is almost certainly not
> maintained in 3.0?
>
> Per below, it seems like it might even be hard to both support Hive 3 and
> Hadoop 2 at the same time?
>
> And while we're at it, what's the + and - for simply only supporting
> Hadoop 3 in Spark 3? Is the difference in client / HDFS API even that big?
> Or what about focusing only on Hadoop 2.9.x support + 3.x support?
>
> Lots of questions, just interested now in informal reactions, not a
> binding decision.
>
> On Thu, Oct 25, 2018 at 11:49 PM Dagang Wei <notifications@github.com>
> wrote:
>
>> Do we really want to switch to Hive 2.3? From this page
>> https://hive.apache.org/downloads.html, Hive 2.3 works with Hadoop 2.x
>> (Hive 3.x works with Hadoop 3.x).
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/apache/spark/pull/21588#issuecomment-433285287>, or mute
>> the thread
>> <https://github.com/notifications/unsubscribe-auth/AAyM-sRygel3il6Ne4FafD5BQ7NDSJ7Mks5uopRlgaJpZM4Usweh>
>> .
>>
>

Mime
View raw message