spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: time for Apache Spark 3.0?
Date Thu, 05 Apr 2018 17:04:53 GMT
Java 9/10 support would be great to add as well.

Regarding Scala 2.12, I thought that supporting it would become easier if we change the Spark
API and ABI slightly. Basically, it is of course possible to create an alternate source tree
today, but it might be possible to share the same source files if we tweak some small things
in the methods that are overloaded across Scala and Java. I don’t remember the exact details,
but the idea was to reduce the total maintenance work needed at the cost of requiring users
to recompile their apps.

I’m personally for moving to 3.0 because of the other things we can clean up as well, e.g.
the default SQL dialect, Iterable stuff, and possibly dependency shading (a major pain point
for lots of users). It’s also a chance to highlight Kubernetes, continuous processing and
other features more if they become “GA".

Matei

> On Apr 5, 2018, at 9:04 AM, Marco Gaido <marcogaido91@gmail.com> wrote:
> 
> Hi all,
> 
> I also agree with Mark that we should add Java 9/10 support to an eventual Spark 3.0
release, because supporting Java 9 is not a trivial task since we are using some internal
APIs for the memory management which changed: either we find a solution which works on both
(but I am not sure it is feasible) or we have to switch between 2 implementations according
to the Java version.
> So I'd rather avoid doing this in a non-major release.
> 
> Thanks,
> Marco
> 
> 
> 2018-04-05 17:35 GMT+02:00 Mark Hamstra <mark@clearstorydata.com>:
> As with Sean, I'm not sure that this will require a new major version, but we should
also be looking at Java 9 & 10 support -- particularly with regard to their better functionality
in a containerized environment (memory limits from cgroups, not sysconf; support for cpusets).
In that regard, we should also be looking at using the latest Scala 2.11.x maintenance release
in current Spark branches.
> 
> On Thu, Apr 5, 2018 at 5:45 AM, Sean Owen <srowen@gmail.com> wrote:
> On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin <rxin@databricks.com> wrote:
> The primary motivating factor IMO for a major version bump is to support Scala 2.12,
which requires minor API breaking changes to Spark’s APIs. Similar to Spark 2.0, I think
there are also opportunities for other changes that we know have been biting us for a long
time but can’t be changed in feature releases (to be clear, I’m actually not sure they
are all good ideas, but I’m writing them down as candidates for consideration):
> 
> IIRC from looking at this, it is possible to support 2.11 and 2.12 simultaneously. The
cross-build already works now in 2.3.0. Barring some big change needed to get 2.12 fully working
-- and that may be the case -- it nearly works that way now.
> 
> Compiling vs 2.11 and 2.12 does however result in some APIs that differ in byte code.
However Scala itself isn't mutually compatible between 2.11 and 2.12 anyway; that's never
been promised as compatible.
> 
> (Interesting question about what *Java* users should expect; they would see a difference
in 2.11 vs 2.12 Spark APIs, but that has always been true.)
> 
> I don't disagree with shooting for Spark 3.0, just saying I don't know if 2.12 support
requires moving to 3.0. But, Spark 3.0 could consider dropping 2.11 support if needed to make
supporting 2.12 less painful.
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message