spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Mahadevan <ar...@apache.org>
Subject Re: Branch 2.4 is cut
Date Mon, 10 Sep 2018 06:55:13 GMT
Ryan's proposal makes a lot of sense. Its better not to release half-baked
changes in 2.4 which not only breaks a lot of the APIs released in 2.3, but
also expected to change further due redesigns before 3.0 so don't see much
value releasing it in 2.4.

On Sun, 9 Sep 2018 at 22:42, Wenchen Fan <cloud0fan@gmail.com> wrote:

> Strictly speaking, data source v2 is always half-finished until we mark it
> as stable. We need some small milestones to move forward step by step.
>
> The redesign also happens in an incremental way. SPARK-24882 mostly focus
> on the "RDD" part of the API: the separation of reader factory and input
> partitions, the introduction of ScanConfig, etc. Then we focus on the
> high-level abstraction and want to change the "table" part of the API.
>
> In my understanding, each PR should be self-contained. If we are OK to
> have SPARK-24882 in master as an individual commit, I think it's also OK to
> have it in branch 2.4.
>
> I've created https://issues.apache.org/jira/browse/SPARK-25390 to track
> the new abstraction. It doesn't change the API a lot, but update the
> streaming execution engine quite a bit.
>
> Thanks,
> Wenchen
>
> On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <rblue@netflix.com> wrote:
>
>> Wenchen, can you hold off on the first RC?
>>
>> The half-finished changes from the redesign of the DataSourceV2 API are
>> in master, added in SPARK-24882
>> <https://github.com/apache/spark/pull/22009>, and are now in the 2.4
>> branch. We've had a lot of good discussion since that PR was merged to
>> update and fix the design, plus only one of the follow-ups on SPARK-25186
>> <https://issues.apache.org/jira/browse/SPARK-25186> is done. Clearly,
>> the redesign was too large to get into 2.4 in so little time -- it was
>> proposed about 10 days before the original branch date -- and I don't think
>> it is a good idea to release half-finished major changes.
>>
>> The easiest solution is to revert SPARK-24882 in the release branch. That
>> way we have minor changes in 2.4 and major changes in the next release,
>> instead of major changes in both. What does everyone think?
>>
>> rb
>>
>> On Fri, Sep 7, 2018 at 10:37 AM shane knapp <sknapp@berkeley.edu> wrote:
>>
>>> ++joshrosen  (thanks for the help w/deploying the jenkins configs)
>>>
>>> the basic 2.4 builds are deployed and building!
>>>
>>> i haven't created (a) build(s) yet for scala 2.12...  i'll be
>>> coordinating this w/the databricks folks next week.
>>>
>>> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <dongjoon.hyun@gmail.com>
>>> wrote:
>>>
>>>> Thank you, Shane! :D
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>> On Fri, Sep 7, 2018 at 9:51 AM shane knapp <sknapp@berkeley.edu> wrote:
>>>>
>>>>> i'll try and get to the 2.4 branch stuff today...
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Mime
View raw message