spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Remove Hadoop 1 support (Hadoop <2.2) for Spark 1.5?
Date Fri, 12 Jun 2015 16:12:00 GMT
I feel this is quite different from the Java 6 decision and personally
I don't see sufficient cause to do it.

I would like to understand though Sean - what is the proposal exactly?
Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think
that makes much sense since so many libraries still use those API's.
For YARN support, we already don't support Hadoop 1. So I'll assume
what you mean is to prevent or stop supporting from linking against
the Hadoop 1 filesystem binaries at runtime (is that right?).

The main reason I'd push back is that I do think there are still
people running the older versions. For instance at Databricks we use
the FileSystem library for talking to S3... every time we've tried to
upgrade to Hadoop 2.X there have been significant regressions in
performance and we've had to downgrade. That's purely anecdotal, but I
think you have people out there using the Hadoop 1 bindings for whom
upgrade would be a pain.

In terms of our maintenance cost, to me the much bigger cost for us
IMO is dealing with differences between e.g. 2.2, 2.4, and 2.6 where
major new API's were added. In comparison the Hadoop 1 vs 2 seems
fairly low with just a few bugs cropping up here and there. So unlike
Java 6 where you have a critical mass of maintenance issues, security
issues, etc, I just don't see as compelling a cost here.

To me the framework for deciding about these upgrades is the
maintenance cost vs the inconvenience for users.

- Patrick

On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas
<nicholas.chammas@gmail.com> wrote:
> I'm personally in favor, but I don't have a sense of how many people still
> rely on Hadoop 1.
>
> Nick
>
> 2015년 6월 12일 (금) 오전 9:13, Steve Loughran
> stevel@hortonworks.com>님이 작성:
>
>> +1 for 2.2+
>>
>> Not only are the APis in Hadoop 2 better, there's more people testing
>> Hadoop 2.x & spark, and bugs in Hadoop itself being fixed.
>>
>> (usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)
>>
>> > On 12 Jun 2015, at 11:09, Sean Owen <sowen@cloudera.com> wrote:
>> >
>> > How does the idea of removing support for Hadoop 1.x for Spark 1.5
>> > strike everyone? Really, I mean, Hadoop < 2.2, as 2.2 seems to me more
>> > consistent with the modern 2.x line than 2.1 or 2.0.
>> >
>> > The arguments against are simply, well, someone out there might be
>> > using these versions.
>> >
>> > The arguments for are just simplification -- fewer gotchas in trying
>> > to keep supporting older Hadoop, of which we've seen several lately.
>> > We get to chop out a little bit of shim code and update to use some
>> > non-deprecated APIs. Along with removing support for Java 6, it might
>> > be a reasonable time to also draw a line under older Hadoop too.
>> >
>> > I'm just gauging feeling now: for, against, indifferent?
>> > I favor it, but would not push hard on it if there are objections.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message