spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jey Kottalam <>
Subject Re: Experience using binary packages on various Hadoop distros
Date Tue, 24 Mar 2015 23:16:32 GMT
Could we gracefully fallback to an in-tree Hadoop binary (e.g. 1.0.4)
in that case? I think many new Spark users are confused about why
Spark has anything to do with Hadoop, e.g. I could see myself being
confused when the download page asks me to select a "package type". I
know that what I want is not "source code", but I'd have no idea how
to choose amongst the apparently multiple types of binaries.

On Tue, Mar 24, 2015 at 2:28 PM, Matei Zaharia <> wrote:
> Just a note, one challenge with the BYOH version might be that users who download that
can't run in local mode without also having Hadoop. But if we describe it correctly then hopefully
it's okay.
> Matei
>> On Mar 24, 2015, at 3:05 PM, Patrick Wendell <> wrote:
>> Hey All,
>> For a while we've published binary packages with different Hadoop
>> client's pre-bundled. We currently have three interfaces to a Hadoop
>> cluster (a) the HDFS client (b) the YARN client (c) the Hive client.
>> Because (a) and (b) are supposed to be backwards compatible
>> interfaces. My working assumption was that for the most part (modulo
>> Hive) our packages work with *newer* Hadoop versions. For instance,
>> our Hadoop 2.4 package should work with HDFS 2.6 and YARN 2.6.
>> However, I have heard murmurings that these are not compatible in
>> practice.
>> So I have three questions I'd like to put out to the community:
>> 1. Have people had difficulty using 2.4 packages with newer Hadoop
>> versions? If so, what specific incompatibilities have you hit?
>> 2. Have people had issues using our binary Hadoop packages in general
>> with commercial or Apache Hadoop distro's, such that you have to build
>> from source?
>> 3. How would people feel about publishing a "bring your own Hadoop"
>> binary, where you are required to point us to a local Hadoop
>> distribution by setting HADOOP_HOME? This might be better for ensuring
>> full compatibility:
>> - Patrick
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message