spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: spark-ec2 default to Hadoop 2
Date Sun, 01 Mar 2015 23:40:29 GMT
Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this
was back when CDH4 was the only real distribution available with some
of the newer Hadoop API's and packaging.

I think to not surprise people using this, it's best to keep v1 as the
default. Overall, we try not to change default values too often to
make upgrading easy for people.

- Patrick

On Sun, Mar 1, 2015 at 3:14 PM, Shivaram Venkataraman
<shivaram@eecs.berkeley.edu> wrote:
> One reason I wouldn't change the default is that the Hadoop 2 launched by
> spark-ec2 is not a full Hadoop 2 distribution -- Its more of a hybrid
> Hadoop version built using CDH4 (it uses HDFS 2, but not YARN AFAIK).
>
> Also our default Hadoop version in the Spark build is still 1.0.4 [1], so
> it makes sense to stick to that in spark-ec2 as well ?
>
> [1] https://github.com/apache/spark/blob/master/pom.xml#L122
>
> Thanks
> Shivaram
>
> On Sun, Mar 1, 2015 at 2:59 PM, Nicholas Chammas <nicholas.chammas@gmail.com
>> wrote:
>
>>
>> https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164
>>
>> Is there any reason we shouldn't update the default Hadoop major version in
>> spark-ec2 to 2?
>>
>> Nick
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message