spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Drop support for old Hive in Spark 3.0?
Date Fri, 26 Oct 2018 17:07:16 GMT
OK let's keep this about Hive.

Right, good point, this is really about supporting metastore versions, and
there is a good argument for retaining backwards-compatibility with older
metastores. I don't know how far, but I guess, as far as is practical?

Isn't there still a lot of Hive 0.x test code? is that something that's
safe to drop for 3.0?

And, basically, what must we do to get rid of the Hive fork? that seems
like a must-do.



On Fri, Oct 26, 2018 at 11:51 AM Dongjoon Hyun <dongjoon.hyun@gmail.com>
wrote:

> Hi, Sean and All.
>
> For the first question, we support only Hive Metastore from 1.x ~ 2.x.
> And, we can support Hive Metastore 3.0 simultaneously. Spark is designed
> like that.
>
> I don't think we need to drop old Hive Metastore Support. Is it
> for avoiding Hive Metastore sharing between Spark2 and Spark3 clusters?
>
> I think we should allow that use cases, especially for new Spark 3
> clusters. How do you think so?
>
>
> For the second question, Apache Spark 2.x doesn't support Hive officially.
> It's only a best-effort approach in a boundary of Spark.
>
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#unsupported-hive-functionality
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#incompatible-hive-udf
>
>
> Not only the documented one, decimal literal(HIVE-17186) makes a query
> result difference even in the well-known benchmark like TPC-H.
>
> Bests,
> Dongjoon.
>
> PS. For Hadoop, let's have another thread if needed. I expect another long
> story. :)
>
>
> On Fri, Oct 26, 2018 at 7:11 AM Sean Owen <srowen@gmail.com> wrote:
>
>> Here's another thread to start considering, and I know it's been raised
>> before.
>> What version(s) of Hive should Spark 3 support?
>>
>> If at least we know it won't include Hive 0.x, could we go ahead and
>> remove those tests from master? It might significantly reduce the run time
>> and flakiness.
>>
>> It seems that maintaining even the Hive 1.x fork is untenable going
>> forward, right? does that also imply this support is almost certainly not
>> maintained in 3.0?
>>
>> Per below, it seems like it might even be hard to both support Hive 3 and
>> Hadoop 2 at the same time?
>>
>> And while we're at it, what's the + and - for simply only supporting
>> Hadoop 3 in Spark 3? Is the difference in client / HDFS API even that big?
>> Or what about focusing only on Hadoop 2.9.x support + 3.x support?
>>
>> Lots of questions, just interested now in informal reactions, not a
>> binding decision.
>>
>> On Thu, Oct 25, 2018 at 11:49 PM Dagang Wei <notifications@github.com>
>> wrote:
>>
>>> Do we really want to switch to Hive 2.3? From this page
>>> https://hive.apache.org/downloads.html, Hive 2.3 works with Hadoop 2.x
>>> (Hive 3.x works with Hadoop 3.x).
>>>
>>> —
>>> You are receiving this because you were mentioned.
>>> Reply to this email directly, view it on GitHub
>>> <https://github.com/apache/spark/pull/21588#issuecomment-433285287>, or
mute
>>> the thread
>>> <https://github.com/notifications/unsubscribe-auth/AAyM-sRygel3il6Ne4FafD5BQ7NDSJ7Mks5uopRlgaJpZM4Usweh>
>>> .
>>>
>>

Mime
View raw message