spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: time for Apache Spark 3.0?
Date Mon, 12 Nov 2018 23:58:08 GMT
All API removal and deprecation JIRAs should be tagged "releasenotes", so
we can reference them when we build release notes. I don't know if
everybody is still following that practice, but it'd be great to do that.
Since we don't have that many PRs, we should still be able to retroactively
tag.

We can also add a new tag for API changes, but I feel at this stage it
might be easier to just use "releasenotes".


On Mon, Nov 12, 2018 at 3:49 PM Matt Cheah <mcheah@palantir.com> wrote:

> I wanted to clarify what categories of APIs are eligible to be broken in
> Spark 3.0. Specifically:
>
>
>
>    - Are we removing all deprecated methods? If we’re only removing some
>    subset of deprecated methods, what is that subset? I see a bunch were
>    removed in https://github.com/apache/spark/pull/22921 for example. Are
>    we only committed to removing methods that were deprecated in some Spark
>    version and earlier?
>    - Aside from removing support for Scala 2.11, what other kinds of
>    (non-experimental and non-evolving) APIs are eligible to be broken?
>    - Is there going to be a way to track the current list of all proposed
>    breaking changes / JIRA tickets? Perhaps we can include it in the JIRA
>    ticket that can be filtered down to somehow?
>
>
>
> Thanks,
>
>
>
> -Matt Cheah
>
> *From: *Vinoo Ganesh <vganesh@palantir.com>
> *Date: *Monday, November 12, 2018 at 2:48 PM
> *To: *Reynold Xin <rxin@databricks.com>
> *Cc: *Xiao Li <gatorsmile@gmail.com>, Matei Zaharia <
> matei.zaharia@gmail.com>, Ryan Blue <rblue@netflix.com>, Mark Hamstra <
> mark@clearstorydata.com>, dev <dev@spark.apache.org>
> *Subject: *Re: time for Apache Spark 3.0?
>
>
>
> Makes sense, thanks Reynold.
>
>
>
> *From: *Reynold Xin <rxin@databricks.com>
> *Date: *Monday, November 12, 2018 at 16:57
> *To: *Vinoo Ganesh <vganesh@palantir.com>
> *Cc: *Xiao Li <gatorsmile@gmail.com>, Matei Zaharia <
> matei.zaharia@gmail.com>, Ryan Blue <rblue@netflix.com>, Mark Hamstra <
> mark@clearstorydata.com>, dev <dev@spark.apache.org>
> *Subject: *Re: time for Apache Spark 3.0?
>
>
>
> Master branch now tracks 3.0.0-SHAPSHOT version, so the next one will be
> 3.0. In terms of time lining, unless we change anything specifically, Spark
> feature releases are on a 6-mo cadence. Spark 2.4 was just released last
> week, so 3.0 will be roughly 6 month from now.
>
>
>
> On Mon, Nov 12, 2018 at 1:54 PM Vinoo Ganesh <vganesh@palantir.com> wrote:
>
> Quickly following up on this – is there a target date for when Spark 3.0
> may be released and/or a list of the likely api breaks that are
> anticipated?
>
>
>
> *From: *Xiao Li <gatorsmile@gmail.com>
> *Date: *Saturday, September 29, 2018 at 02:09
> *To: *Reynold Xin <rxin@databricks.com>
> *Cc: *Matei Zaharia <matei.zaharia@gmail.com>, Ryan Blue <
> rblue@netflix.com>, Mark Hamstra <mark@clearstorydata.com>, "
> user@spark.apache.org" <dev@spark.apache.org>
> *Subject: *Re: time for Apache Spark 3.0?
>
>
>
> Yes. We should create a SPIP for each major breaking change.
>
>
>
> Reynold Xin <rxin@databricks.com> 于2018年9月28日周五 下午11:05写道:
>
> i think we should create spips for some of them, since they are pretty
> large ... i can create some tickets to start with
>
>
> --
>
> excuse the brevity and lower case due to wrist injury
>
>
>
>
>
> On Fri, Sep 28, 2018 at 11:01 PM Xiao Li <gatorsmile@gmail.com> wrote:
>
> Based on the above discussions, we have a "rough consensus" that the next
> release will be 3.0. Now, we can start working on the API breaking changes
> (e.g., the ones mentioned in the original email from Reynold).
>
>
>
> Cheers,
>
>
>
> Xiao
>
>
>
> Matei Zaharia <matei.zaharia@gmail.com> 于2018年9月6日周四 下午2:21写道:
>
> Yes, you can start with Unstable and move to Evolving and Stable when
> needed. We’ve definitely had experimental features that changed across
> maintenance releases when they were well-isolated. If your change risks
> breaking stuff in stable components of Spark though, then it probably won’t
> be suitable for that.
>
> > On Sep 6, 2018, at 1:49 PM, Ryan Blue <rblue@netflix.com.INVALID> wrote:
> >
> > I meant flexibility beyond the point releases. I think what Reynold was
> suggesting was getting v2 code out more often than the point releases every
> 6 months. An Evolving API can change in point releases, but maybe we should
> move v2 to Unstable so it can change more often? I don't really see another
> way to get changes out more often.
> >
> > On Thu, Sep 6, 2018 at 11:07 AM Mark Hamstra <mark@clearstorydata.com>
> wrote:
> > Yes, that is why we have these annotations in the code and the
> corresponding labels appearing in the API documentation: https://github.com/apache/spark/blob/master/common/tags/src/main/java/org/apache/spark/annotation/InterfaceStability.java
> [github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_master_common_tags_src_main_java_org_apache_spark_annotation_InterfaceStability.java&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=XgVDeB7pewN3jZ6po86BzIEmn1mgLmYtNGgcLZMQRjY&s=VSHC6Lqh_ewbLsLD69bdkRpXSeiR63uu3wOcHeJizbc&e=>
> >
> > As long as it is properly annotated, we can change or even eliminate an
> API method before the next major release. And frankly, we shouldn't be
> contemplating bringing in the DS v2 API (and, I'd argue, any new API)
> without such an annotation. There is just too much risk of not getting
> everything right before we see the results of the new API being more widely
> used, and too much cost in maintaining until the next major release
> something that we come to regret for us to create new API in a fully frozen
> state.
> >
> >
> > On Thu, Sep 6, 2018 at 9:49 AM Ryan Blue <rblue@netflix.com.invalid>
> wrote:
> > It would be great to get more features out incrementally. For
> experimental features, do we have more relaxed constraints?
> >
> > On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin <rxin@databricks.com> wrote:
> > +1 on 3.0
> >
> > Dsv2 stable can still evolve in across major releases. DataFrame,
> Dataset, dsv1 and a lot of other major features all were developed
> throughout the 1.x and 2.x lines.
> >
> > I do want to explore ways for us to get dsv2 incremental changes out
> there more frequently, to get feedback. Maybe that means we apply additive
> changes to 2.4.x; maybe that means making another 2.5 release sooner. I
> will start a separate thread about it.
> >
> >
> >
> > On Thu, Sep 6, 2018 at 9:31 AM Sean Owen <srowen@gmail.com> wrote:
> > I think this doesn't necessarily mean 3.0 is coming soon (thoughts on
> timing? 6 months?) but simply next. Do you mean you'd prefer that change to
> happen before 3.x? if it's a significant change, seems reasonable for a
> major version bump rather than minor. Is the concern that tying it to 3.0
> means you have to take a major version update to get it?
> >
> > I generally support moving on to 3.x so we can also jettison a lot of
> older dependencies, code, fix some long standing issues, etc.
> >
> > (BTW Scala 2.12 support, mentioned in the OP, will go in for 2.4)
> >
> > On Thu, Sep 6, 2018 at 9:10 AM Ryan Blue <rblue@netflix.com.invalid>
> wrote:
> > My concern is that the v2 data source API is still evolving and not very
> close to stable. I had hoped to have stabilized the API and behaviors for a
> 3.0 release. But we could also wait on that for a 4.0 release, depending on
> when we think that will be.
> >
> > Unless there is a pressing need to move to 3.0 for some other area, I
> think it would be better for the v2 sources to have a 2.5 release.
> >
> > On Thu, Sep 6, 2018 at 8:59 AM Xiao Li <gatorsmile@gmail.com> wrote:
> > Yesterday, the 2.4 branch was created. Based on the above discussion, I
> think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern?
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message