spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Hadoop 3 support
Date Tue, 03 Apr 2018 20:33:03 GMT


On 3 Apr 2018, at 01:30, Saisai Shao <sai.sai.shao@gmail.com<mailto:sai.sai.shao@gmail.com>>
wrote:

Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark) doesn't support
run on Hadoop 3. Hive will check the Hadoop version in the runtime [1]. Besides this I think
some pom changes should be enough to support Hadoop 3.

If we want to use Hadoop 3 shaded client jar, then the pom requires lots of changes, but this
is not necessary.


[1] https://github.com/apache/hive/blob/6751225a5cde4c40839df8b46e8d241fdda5cd34/shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java#L144

2018-04-03 4:57 GMT+08:00 Marcelo Vanzin <vanzin@cloudera.com<mailto:vanzin@cloudera.com>>:
Saisai filed SPARK-23534, but the main blocking issue is really SPARK-18673.


On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin <rxin@databricks.com<mailto:rxin@databricks.com>>
wrote:
> Does anybody know what needs to be done in order for Spark to support Hadoop
> 3?
>


To be ruthless, I'd view Hadoop 3.1 as the first one to play with...3.0.x was more of a wide-version
check. Hadoop 3.1RC0 is out this week, making it the ideal (last!) time to find showstoppers.

1. I've got a PR which adds a profile to build spark against hadoop 3, with some fixes for
zk import along with better hadoop-cloud profile

https://github.com/apache/spark/pull/20923


Apply that and patch and both mvn and sbt can build with the RC0 from the ASF staging repo:

build/sbt -Phadoop-3,hadoop-cloud,yarn -Psnapshots-and-staging



2. Everything Marcelo says about hive.

You can build hadoop locally with a -Dhadoop.version=2.11 and the hive 1.2.1.-spark version
check goes through. You can't safely bring up HDFS like that, but you can run spark standalone
against things

Some strategies

Short term: build a new hive-1,2.x-spark which fixes up the version check and merges in those
critical patches that cloudera, hortoworks, databricks, + anyone else has got in for their
production systems. I don't think we have that many.

That leaves a "how to release" story, as the ASF will want it to come out under the ASF auspices,
and, given the liability disclaimers, so should everyone. The Hive team could be "invited"
to publish it as their own if people ask nicely.

Long term
 -do something about that subclassing to get the thrift endpoint to work. That can include
fixing hive's service to be subclass friendly.
 -move to hive 2

That' s a major piece of work.

Mime
View raw message