spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fregly <>
Subject Re: Straw poll: dropping support for things like Scala 2.10
Date Fri, 28 Oct 2016 10:47:03 GMT
i seem to remember a large spark user (tencent, i believe) chiming in late during these discussions
6-12 months ago and squashing any sort of deprecation given the massive effort that would
be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users - and reflect
the real world versus ideal world.

otherwise, this is all for naught like last time.

> On Oct 28, 2016, at 10:43 AM, Sean Owen <> wrote:
> If the subtext is vendors, then I'd have a look at what recent distros look like. I'll
write about CDH as a representative example, but I think other distros are naturally similar.
> CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014).
Granted, this depends on installing on an OS with that Java / Python version. But Java 8 /
Python 2.7 is available for all of the supported OSes. The population that isn't on CDH 4,
because that supported was dropped a long time ago in Spark, and who is on a version released
2-2.5 years ago, and won't update, is a couple percent of the installed base. They do not
in general want anything to change at all.
> I assure everyone that vendors too are aligned in wanting to cater to the crowd that
wants the most recent version of everything. For example, CDH offers both Spark 2.0.1 and
1.6 at the same time.
> I wouldn't dismiss support for these supporting components as a relevant proxy for whether
they are worth supporting in Spark. Java 7 is long since EOL (no, I don't count paying Oracle
for support). No vendor is supporting Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014.
Is there a criteria here that reaches a different conclusion about these things just for Spark?
This was roughly the same conversation that happened 6 months ago.
> I imagine we're going to find that in about 6 months it'll make more sense all around
to remove these. If we can just give a heads up with deprecation and then kick the can down
the road a bit more, that sounds like enough for now.
>> On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <> wrote:
>> Deprecating them is fine (and I know they're already deprecated), the question is
just whether to remove them. For example, what exactly is the downside of having Python 2.6
or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton
of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have
old versions of these.
>> Just talking with users, I've seen many of people who say "we have a Hadoop cluster
from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's
great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs
because they are time-consuming to update. Having the whole community on a small set of versions
leads to a better experience for everyone and also to more of a "network effect": more people
can battle-test new versions, answer questions about them online, write libraries that easily
reach the majority of Spark users, etc.

View raw message