spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: time for Apache Spark 3.0?
Date Thu, 05 Apr 2018 17:30:12 GMT
Sorry, but just to be clear here, this is the 2.12 API issue: https://issues.apache.org/jira/browse/SPARK-14643,
with more details in this doc: https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit.

Basically, if we are allowed to change Spark’s API a little to have only one version of
methods that are currently overloaded between Java and Scala, we can get away with a single
source three for all Scala versions and Java ABI compatibility against any type of Spark (whether
using Scala 2.11 or 2.12). On the other hand, if we want to keep the API and ABI of the Spark
2.x branch, we’ll need a different source tree for Scala 2.12 with different copies of pretty
large classes such as RDD, DataFrame and DStream, and Java users may have to change their
code when linking against different versions of Spark.

This is of course only one of the possible ABI changes, but it is a considerable engineering
effort, so we’d have to sign up for maintaining all these different source files. It seems
kind of silly given that Scala 2.12 was released in 2016, so we’re doing all this work to
keep ABI compatibility for Scala 2.11, which isn’t even that widely used any more for new
projects. Also keep in mind that the next Spark release will probably take at least 3-4 months,
so we’re talking about what people will be using in fall 2018.

Matei

> On Apr 5, 2018, at 10:13 AM, Marcelo Vanzin <vanzin@cloudera.com> wrote:
> 
> I remember seeing somewhere that Scala still has some issues with Java
> 9/10 so that might be hard...
> 
> But on that topic, it might be better to shoot for Java 11
> compatibility. 9 and 10, following the new release model, aren't
> really meant to be long-term releases.
> 
> In general, agree with Sean here. Doesn't look like 2.12 support
> requires unexpected API breakages. So unless there's a really good
> reason to break / remove a bunch of existing APIs...
> 
> On Thu, Apr 5, 2018 at 9:04 AM, Marco Gaido <marcogaido91@gmail.com> wrote:
>> Hi all,
>> 
>> I also agree with Mark that we should add Java 9/10 support to an eventual
>> Spark 3.0 release, because supporting Java 9 is not a trivial task since we
>> are using some internal APIs for the memory management which changed: either
>> we find a solution which works on both (but I am not sure it is feasible) or
>> we have to switch between 2 implementations according to the Java version.
>> So I'd rather avoid doing this in a non-major release.
>> 
>> Thanks,
>> Marco
>> 
>> 
>> 2018-04-05 17:35 GMT+02:00 Mark Hamstra <mark@clearstorydata.com>:
>>> 
>>> As with Sean, I'm not sure that this will require a new major version, but
>>> we should also be looking at Java 9 & 10 support -- particularly with regard
>>> to their better functionality in a containerized environment (memory limits
>>> from cgroups, not sysconf; support for cpusets). In that regard, we should
>>> also be looking at using the latest Scala 2.11.x maintenance release in
>>> current Spark branches.
>>> 
>>> On Thu, Apr 5, 2018 at 5:45 AM, Sean Owen <srowen@gmail.com> wrote:
>>>> 
>>>> On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin <rxin@databricks.com> wrote:
>>>>> 
>>>>> The primary motivating factor IMO for a major version bump is to support
>>>>> Scala 2.12, which requires minor API breaking changes to Spark’s APIs.
>>>>> Similar to Spark 2.0, I think there are also opportunities for other
changes
>>>>> that we know have been biting us for a long time but can’t be changed
in
>>>>> feature releases (to be clear, I’m actually not sure they are all good
>>>>> ideas, but I’m writing them down as candidates for consideration):
>>>> 
>>>> 
>>>> IIRC from looking at this, it is possible to support 2.11 and 2.12
>>>> simultaneously. The cross-build already works now in 2.3.0. Barring some
big
>>>> change needed to get 2.12 fully working -- and that may be the case -- it
>>>> nearly works that way now.
>>>> 
>>>> Compiling vs 2.11 and 2.12 does however result in some APIs that differ
>>>> in byte code. However Scala itself isn't mutually compatible between 2.11
>>>> and 2.12 anyway; that's never been promised as compatible.
>>>> 
>>>> (Interesting question about what *Java* users should expect; they would
>>>> see a difference in 2.11 vs 2.12 Spark APIs, but that has always been true.)
>>>> 
>>>> I don't disagree with shooting for Spark 3.0, just saying I don't know if
>>>> 2.12 support requires moving to 3.0. But, Spark 3.0 could consider dropping
>>>> 2.11 support if needed to make supporting 2.12 less painful.
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Marcelo
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message