spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: time for Apache Spark 3.0?
Date Thu, 05 Apr 2018 17:39:29 GMT
Oh, forgot to add, but splitting the source tree in Scala also creates the issue of a big maintenance
burden for third-party libraries built on Spark. As Josh said on the JIRA:

"I think this is primarily going to be an issue for end users who want to use an existing
source tree to cross-compile for Scala 2.10, 2.11, and 2.12. Thus the pain of the source incompatibility
would mostly be felt by library/package maintainers but it can be worked around as long as
there's at least some common subset which is source compatible across all of those versions.”

This means that all the data sources, ML algorithms, etc developed outside our source tree
would have to do the same thing we do internally.

> On Apr 5, 2018, at 10:30 AM, Matei Zaharia <> wrote:
> Sorry, but just to be clear here, this is the 2.12 API issue:,
with more details in this doc:
> Basically, if we are allowed to change Spark’s API a little to have only one version
of methods that are currently overloaded between Java and Scala, we can get away with a single
source three for all Scala versions and Java ABI compatibility against any type of Spark (whether
using Scala 2.11 or 2.12). On the other hand, if we want to keep the API and ABI of the Spark
2.x branch, we’ll need a different source tree for Scala 2.12 with different copies of pretty
large classes such as RDD, DataFrame and DStream, and Java users may have to change their
code when linking against different versions of Spark.
> This is of course only one of the possible ABI changes, but it is a considerable engineering
effort, so we’d have to sign up for maintaining all these different source files. It seems
kind of silly given that Scala 2.12 was released in 2016, so we’re doing all this work to
keep ABI compatibility for Scala 2.11, which isn’t even that widely used any more for new
projects. Also keep in mind that the next Spark release will probably take at least 3-4 months,
so we’re talking about what people will be using in fall 2018.
> Matei
>> On Apr 5, 2018, at 10:13 AM, Marcelo Vanzin <> wrote:
>> I remember seeing somewhere that Scala still has some issues with Java
>> 9/10 so that might be hard...
>> But on that topic, it might be better to shoot for Java 11
>> compatibility. 9 and 10, following the new release model, aren't
>> really meant to be long-term releases.
>> In general, agree with Sean here. Doesn't look like 2.12 support
>> requires unexpected API breakages. So unless there's a really good
>> reason to break / remove a bunch of existing APIs...
>> On Thu, Apr 5, 2018 at 9:04 AM, Marco Gaido <> wrote:
>>> Hi all,
>>> I also agree with Mark that we should add Java 9/10 support to an eventual
>>> Spark 3.0 release, because supporting Java 9 is not a trivial task since we
>>> are using some internal APIs for the memory management which changed: either
>>> we find a solution which works on both (but I am not sure it is feasible) or
>>> we have to switch between 2 implementations according to the Java version.
>>> So I'd rather avoid doing this in a non-major release.
>>> Thanks,
>>> Marco
>>> 2018-04-05 17:35 GMT+02:00 Mark Hamstra <>:
>>>> As with Sean, I'm not sure that this will require a new major version, but
>>>> we should also be looking at Java 9 & 10 support -- particularly with
>>>> to their better functionality in a containerized environment (memory limits
>>>> from cgroups, not sysconf; support for cpusets). In that regard, we should
>>>> also be looking at using the latest Scala 2.11.x maintenance release in
>>>> current Spark branches.
>>>> On Thu, Apr 5, 2018 at 5:45 AM, Sean Owen <> wrote:
>>>>> On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin <>
>>>>>> The primary motivating factor IMO for a major version bump is to
>>>>>> Scala 2.12, which requires minor API breaking changes to Spark’s
>>>>>> Similar to Spark 2.0, I think there are also opportunities for other
>>>>>> that we know have been biting us for a long time but can’t be changed
>>>>>> feature releases (to be clear, I’m actually not sure they are all
>>>>>> ideas, but I’m writing them down as candidates for consideration):
>>>>> IIRC from looking at this, it is possible to support 2.11 and 2.12
>>>>> simultaneously. The cross-build already works now in 2.3.0. Barring some
>>>>> change needed to get 2.12 fully working -- and that may be the case --
>>>>> nearly works that way now.
>>>>> Compiling vs 2.11 and 2.12 does however result in some APIs that differ
>>>>> in byte code. However Scala itself isn't mutually compatible between
>>>>> and 2.12 anyway; that's never been promised as compatible.
>>>>> (Interesting question about what *Java* users should expect; they would
>>>>> see a difference in 2.11 vs 2.12 Spark APIs, but that has always been
>>>>> I don't disagree with shooting for Spark 3.0, just saying I don't know
>>>>> 2.12 support requires moving to 3.0. But, Spark 3.0 could consider dropping
>>>>> 2.11 support if needed to make supporting 2.12 less painful.
>> -- 
>> Marcelo
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail:

To unsubscribe e-mail:

View raw message