spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Hadoop 3 support
Date Tue, 03 Apr 2018 20:38:16 GMT


On 3 Apr 2018, at 01:30, Saisai Shao <sai.sai.shao@gmail.com<mailto:sai.sai.shao@gmail.com>>
wrote:

Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark) doesn't support
run on Hadoop 3. Hive will check the Hadoop version in the runtime [1]. Besides this I think
some pom changes should be enough to support Hadoop 3.

If we want to use Hadoop 3 shaded client jar, then the pom requires lots of changes, but this
is not necessary.


[1] https://github.com/apache/hive/blob/6751225a5cde4c40839df8b46e8d241fdda5cd34/shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java#L144


I don't think the hadoop-shaded JAR is complete enough for spark yet...it was very much driven
by HBase's needs. But there's only one way to get Hadoop to fix that: try the move, find the
problems, complain noisily. Then Hadoop 3.2 and/or a 3.1.x for x>=1 can have the broader
shading

Assume my name is next to the "Shade hadoop-cloud-storage" problem, though there the fact
that aws-java-sdk-bundle is 50 MB already, I don't plan to shade that at all. The AWS shading
already isolates everything from amazon's choice of Jackson, which was one of the sore points.

-Steve

Mime
View raw message