spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: Experience using binary packages on various Hadoop distros
Date Wed, 25 Mar 2015 18:51:36 GMT
Hey Patrick,

The only issue I've seen so far has been the YARN container ID issue.
That can be technically be described as a breakage in forwards
compatibility in YARN. The APIs didn't break, but the data transferred
through YARN's protocol has, and the old library cannot understand the
data sent by a new service (the new container ID).

The main issue with publishing BYOH is what Matei already mentioned.
It would be worth it to take a look at what projects that depend on
Hadoop do, though.

Speaking with the Cloudera hat on, Spark in CDH is already "BYOH",
except Hadoop is already there with the rest of CDH.


On Tue, Mar 24, 2015 at 12:05 PM, Patrick Wendell <pwendell@gmail.com> wrote:
> Hey All,
>
> For a while we've published binary packages with different Hadoop
> client's pre-bundled. We currently have three interfaces to a Hadoop
> cluster (a) the HDFS client (b) the YARN client (c) the Hive client.
>
> Because (a) and (b) are supposed to be backwards compatible
> interfaces. My working assumption was that for the most part (modulo
> Hive) our packages work with *newer* Hadoop versions. For instance,
> our Hadoop 2.4 package should work with HDFS 2.6 and YARN 2.6.
> However, I have heard murmurings that these are not compatible in
> practice.
>
> So I have three questions I'd like to put out to the community:
>
> 1. Have people had difficulty using 2.4 packages with newer Hadoop
> versions? If so, what specific incompatibilities have you hit?
> 2. Have people had issues using our binary Hadoop packages in general
> with commercial or Apache Hadoop distro's, such that you have to build
> from source?
> 3. How would people feel about publishing a "bring your own Hadoop"
> binary, where you are required to point us to a local Hadoop
> distribution by setting HADOOP_HOME? This might be better for ensuring
> full compatibility:
> https://issues.apache.org/jira/browse/SPARK-6511
>
> - Patrick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message