spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <lix...@databricks.com>
Subject Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?
Date Sat, 02 Nov 2019 22:28:08 GMT
The changes for JDK 11 supports are not increasing the risk of Hadoop 3.2
profile.

Hive 1.2.1 execution JARs are much more stable than Hive 2.3.6 execution
JARs. The changes of thrift-servers are massive. We need more evidence to
prove the quality and stability before we switching the default to Hadoop
3.2 profile. Adoption of Spark 3.0 is more important in the current moment.
I think we can switch the default profile in Spark 3.1 or 3.2 releases,
instead of Spark 3.0.


On Fri, Nov 1, 2019 at 6:21 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
wrote:

> Hi, Xiao.
>
> How JDK11-support can make `Hadoop-3.2 profile` risky? We build and
> publish with JDK8.
>
> > In this release, Hive execution module upgrade [from 1.2 to 2.3], Hive
> thrift-server upgrade, and JDK11 supports are added to Hadoop 3.2 profile
> only.
>
> Since we build and publish with JDK8 and the default runtime is still
> JDK8, I don't think `hadoop-3.2 profile` is risky in that context.
>
> For JDK11, Hive execution module 2.3.6 doesn't support JDK11 still in
> terms of remote HiveMetastore.
>
> So, among the above reasons, we can say that Hive execution module (with
> Hive 2.3.6) can be the root cause of potential unknown issues.
>
> In other words, `Hive 1.2.1` is the one you think stable, isn't it?
>
> Although Hive 2.3.6 might be not proven in Apache Spark officially, we
> resolved several SPARK issues by upgrading Hive from 1.2.1 to 2.3.6 also.
>
> Bests,
> Dongjoon.
>
>
>
> On Fri, Nov 1, 2019 at 5:37 PM Jiaxin Shan <seedjeffwan@gmail.com> wrote:
>
>> +1 for Hadoop 3.2.  Seems lots of cloud integration efforts Steve made is
>> only available in 3.2. We see lots of users asking for better S3A support
>> in Spark.
>>
>> On Fri, Nov 1, 2019 at 9:46 AM Xiao Li <lixiao@databricks.com> wrote:
>>
>>> Hi, Steve,
>>>
>>> Thanks for your comments! My major quality concern is not against Hadoop
>>> 3.2. In this release, Hive execution module upgrade [from 1.2 to 2.3], Hive
>>> thrift-server upgrade, and JDK11 supports are added to Hadoop 3.2 profile
>>> only. Compared with Hadoop 2.x profile, the Hadoop 3.2 profile is more
>>> risky due to these changes.
>>>
>>> To speed up the adoption of Spark 3.0, which has many other highly
>>> desirable features, I am proposing to keep Hadoop 2.x profile as the
>>> default.
>>>
>>> Cheers,
>>>
>>> Xiao.
>>>
>>>
>>>
>>> On Fri, Nov 1, 2019 at 5:33 AM Steve Loughran <stevel@cloudera.com>
>>> wrote:
>>>
>>>> What is the current default value? as the 2.x releases are becoming
>>>> EOL; 2.7 is dead, there might be a 2.8.x; for now 2.9 is the branch-2
>>>> release getting attention. 2.10.0 shipped yesterday, but the ".0" means
>>>> there will inevitably be surprises.
>>>>
>>>> One issue about using a older versions is that any problem reported
>>>> -especially at stack traces you can blame me for- Will generally be met by
>>>> a response of "does it go away when you upgrade?" The other issue is how
>>>> much test coverage are things getting?
>>>>
>>>> w.r.t Hadoop 3.2 stability, nothing major has been reported. The ABFS
>>>> client is there, and I the big guava update (HADOOP-16213) went in. People
>>>> will either love or hate that.
>>>>
>>>> No major changes in s3a code between 3.2.0 and 3.2.1; I have a large
>>>> backport planned though, including changes to better handle AWS caching of
>>>> 404s generatd from HEAD requests before an object was actually created.
>>>>
>>>> It would be really good if the spark distributions shipped with later
>>>> versions of the hadoop artifacts.
>>>>
>>>> On Mon, Oct 28, 2019 at 7:53 PM Xiao Li <lixiao@databricks.com> wrote:
>>>>
>>>>> The stability and quality of Hadoop 3.2 profile are unknown. The
>>>>> changes are massive, including Hive execution and a new version of Hive
>>>>> thriftserver.
>>>>>
>>>>> To reduce the risk, I would like to keep the current default version
>>>>> unchanged. When it becomes stable, we can change the default profile
to
>>>>> Hadoop-3.2.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Xiao
>>>>>
>>>>> On Mon, Oct 28, 2019 at 12:51 PM Sean Owen <srowen@gmail.com> wrote:
>>>>>
>>>>>> I'm OK with that, but don't have a strong opinion nor info about
the
>>>>>> implications.
>>>>>> That said my guess is we're close to the point where we don't need
to
>>>>>> support Hadoop 2.x anyway, so, yeah.
>>>>>>
>>>>>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun <
>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>> >
>>>>>> > Hi, All.
>>>>>> >
>>>>>> > There was a discussion on publishing artifacts built with Hadoop
3 .
>>>>>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview`
>>>>>> will be the same because we didn't change anything yet.
>>>>>> >
>>>>>> > Technically, we need to change two places for publishing.
>>>>>> >
>>>>>> > 1. Jenkins Snapshot Publishing
>>>>>> >
>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
>>>>>> >
>>>>>> > 2. Release Snapshot/Release Publishing
>>>>>> >
>>>>>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
>>>>>> >
>>>>>> > To minimize the change, we need to switch our default Hadoop
>>>>>> profile.
>>>>>> >
>>>>>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and
>>>>>> `hadoop-3.2 (3.2.0)` is optional.
>>>>>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
>>>>>> optionally.
>>>>>> >
>>>>>> > Note that this means we use Hive 2.3.6 by default. Only
>>>>>> `hadoop-2.7` distribution will use `Hive 1.2.1` like Apache Spark
2.4.x.
>>>>>> >
>>>>>> > Bests,
>>>>>> > Dongjoon.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> [image: Databricks Summit - Watch the talks]
>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>
>>>>
>>>
>>> --
>>> [image: Databricks Summit - Watch the talks]
>>> <https://databricks.com/sparkaisummit/north-america>
>>>
>>
>>
>> --
>> Best Regards!
>> Jiaxin Shan
>> Tel:  412-230-7670
>> Address: 470 2nd Ave S, Kirkland, WA
>>
>>

-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Mime
View raw message