spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <>
Subject Re: time for Apache Spark 3.0?
Date Thu, 06 Sep 2018 20:49:47 GMT
I meant flexibility beyond the point releases. I think what Reynold was
suggesting was getting v2 code out more often than the point releases every
6 months. An Evolving API can change in point releases, but maybe we should
move v2 to Unstable so it can change more often? I don't really see another
way to get changes out more often.

On Thu, Sep 6, 2018 at 11:07 AM Mark Hamstra <>

> Yes, that is why we have these annotations in the code and the
> corresponding labels appearing in the API documentation:
> As long as it is properly annotated, we can change or even eliminate an
> API method before the next major release. And frankly, we shouldn't be
> contemplating bringing in the DS v2 API (and, I'd argue, *any* new API)
> without such an annotation. There is just too much risk of not getting
> everything right before we see the results of the new API being more widely
> used, and too much cost in maintaining until the next major release
> something that we come to regret for us to create new API in a fully frozen
> state.
> On Thu, Sep 6, 2018 at 9:49 AM Ryan Blue <>
> wrote:
>> It would be great to get more features out incrementally. For
>> experimental features, do we have more relaxed constraints?
>> On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin <> wrote:
>>> +1 on 3.0
>>> Dsv2 stable can still evolve in across major releases. DataFrame,
>>> Dataset, dsv1 and a lot of other major features all were developed
>>> throughout the 1.x and 2.x lines.
>>> I do want to explore ways for us to get dsv2 incremental changes out
>>> there more frequently, to get feedback. Maybe that means we apply additive
>>> changes to 2.4.x; maybe that means making another 2.5 release sooner. I
>>> will start a separate thread about it.
>>> On Thu, Sep 6, 2018 at 9:31 AM Sean Owen <> wrote:
>>>> I think this doesn't necessarily mean 3.0 is coming soon (thoughts on
>>>> timing? 6 months?) but simply next. Do you mean you'd prefer that change
>>>> happen before 3.x? if it's a significant change, seems reasonable for a
>>>> major version bump rather than minor. Is the concern that tying it to 3.0
>>>> means you have to take a major version update to get it?
>>>> I generally support moving on to 3.x so we can also jettison a lot of
>>>> older dependencies, code, fix some long standing issues, etc.
>>>> (BTW Scala 2.12 support, mentioned in the OP, will go in for 2.4)
>>>> On Thu, Sep 6, 2018 at 9:10 AM Ryan Blue <>
>>>> wrote:
>>>>> My concern is that the v2 data source API is still evolving and not
>>>>> very close to stable. I had hoped to have stabilized the API and behaviors
>>>>> for a 3.0 release. But we could also wait on that for a 4.0 release,
>>>>> depending on when we think that will be.
>>>>> Unless there is a pressing need to move to 3.0 for some other area, I
>>>>> think it would be better for the v2 sources to have a 2.5 release.
>>>>> On Thu, Sep 6, 2018 at 8:59 AM Xiao Li <> wrote:
>>>>>> Yesterday, the 2.4 branch was created. Based on the above discussion,
>>>>>> I think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern?
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix

Ryan Blue
Software Engineer

View raw message