spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Dudziak <>
Subject Re: Remove Hadoop 1 support (Hadoop <2.2) for Spark 1.5?
Date Fri, 12 Jun 2015 18:18:21 GMT
-1 to this, we use it with an old Hadoop version (well, a fork of an old
version, 0.23). That being said, if there were a nice developer api that
separates Spark from Hadoop (or rather, two APIs, one for scheduling and
one for HDFS), then we'd be happy to maintain our own plugins for those.


On Fri, Jun 12, 2015 at 9:42 AM, Sean Owen <> wrote:

> On Fri, Jun 12, 2015 at 5:12 PM, Patrick Wendell <>
> wrote:
> > I would like to understand though Sean - what is the proposal exactly?
> > Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
> > removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think
> Not entirely; you can see some binary incompatibilities that have
> bitten recently. A Hadoop 1 program does not in general work on Hadoop
> 2 because of this.
> Part of my thinking is that I'm not clear Hadoop 1.x, and 2.0.x, fully
> works anymore anyway. See for example SPARK-8057 recently. I recall
> similar problems with Hadoop 2.0.x-era releases and the Spark build
> for that which is basically the 'cdh4' build.
> So one benefit is skipping whatever work would be needed to continue
> to fix this up, and, the argument is there may be less loss of
> functionality than it seems. The other is being able to use later
> APIs. This much is a little minor.
> > The main reason I'd push back is that I do think there are still
> > people running the older versions. For instance at Databricks we use
> > the FileSystem library for talking to S3... every time we've tried to
> > upgrade to Hadoop 2.X there have been significant regressions in
> > performance and we've had to downgrade. That's purely anecdotal, but I
> > think you have people out there using the Hadoop 1 bindings for whom
> > upgrade would be a pain.
> Yeah, that's the question. Is anyone out there using 1.x? More
> anecdotes wanted. That might be the most interesting question.
> No CDH customers would have been for a long while now, for example.
> (Still a small number of CDH 4 customers out there though, and that's
> 2.0.x or so, but that's a gray area.)
> Is the S3 library thing really related to Hadoop 1.x? that comes from
> jets3t and that's independent.
> > In terms of our maintenance cost, to me the much bigger cost for us
> > IMO is dealing with differences between e.g. 2.2, 2.4, and 2.6 where
> > major new API's were added. In comparison the Hadoop 1 vs 2 seems
> Really? I'd say the opposite. No APIs that are only in 2.2, let alone
> only in a later version, can be in use now, right? 1.x wouldn't work
> at all then. I don't know of any binary incompatibilities of the type
> between 1.x and 2.x, which we have had to shim to make work.
> In both cases dependencies have to be harmonized here and there, yes.
> That won't change.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message