spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: Branch 2.4 is cut
Date Tue, 11 Sep 2018 02:34:39 GMT
Since it's not a clean revert, I've sent a PR to revert it from 2.4, please
take a look, thanks!
https://github.com/apache/spark/pull/22388


On Tue, Sep 11, 2018 at 1:16 AM Ryan Blue <rblue@netflix.com> wrote:

> SPARK-24882 was committed in order to make some progress, with a note
> about following up with separate PRs. But the reason why all of the open
> discussions were happening on the same PR is that this was so close to the
> 2.4 branching. I wanted to make sure that either the redesign was finished
> or it didn't go into 2.4.
>
> There are major changes that need to happen for the next release, like
> updating the write path. I think it would be better not to change this only
> to include another major change in the next release.
>
> On Sun, Sep 9, 2018 at 10:41 PM Wenchen Fan <cloud0fan@gmail.com> wrote:
>
>> Strictly speaking, data source v2 is always half-finished until we mark
>> it as stable. We need some small milestones to move forward step by step.
>>
>> The redesign also happens in an incremental way. SPARK-24882 mostly focus
>> on the "RDD" part of the API: the separation of reader factory and input
>> partitions, the introduction of ScanConfig, etc. Then we focus on the
>> high-level abstraction and want to change the "table" part of the API.
>>
>> In my understanding, each PR should be self-contained. If we are OK to
>> have SPARK-24882 in master as an individual commit, I think it's also OK to
>> have it in branch 2.4.
>>
>> I've created https://issues.apache.org/jira/browse/SPARK-25390 to track
>> the new abstraction. It doesn't change the API a lot, but update the
>> streaming execution engine quite a bit.
>>
>> Thanks,
>> Wenchen
>>
>> On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <rblue@netflix.com> wrote:
>>
>>> Wenchen, can you hold off on the first RC?
>>>
>>> The half-finished changes from the redesign of the DataSourceV2 API are
>>> in master, added in SPARK-24882
>>> <https://github.com/apache/spark/pull/22009>, and are now in the 2.4
>>> branch. We've had a lot of good discussion since that PR was merged to
>>> update and fix the design, plus only one of the follow-ups on
>>> SPARK-25186 <https://issues.apache.org/jira/browse/SPARK-25186> is
>>> done. Clearly, the redesign was too large to get into 2.4 in so little time
>>> -- it was proposed about 10 days before the original branch date -- and I
>>> don't think it is a good idea to release half-finished major changes.
>>>
>>> The easiest solution is to revert SPARK-24882 in the release branch.
>>> That way we have minor changes in 2.4 and major changes in the next
>>> release, instead of major changes in both. What does everyone think?
>>>
>>> rb
>>>
>>> On Fri, Sep 7, 2018 at 10:37 AM shane knapp <sknapp@berkeley.edu> wrote:
>>>
>>>> ++joshrosen  (thanks for the help w/deploying the jenkins configs)
>>>>
>>>> the basic 2.4 builds are deployed and building!
>>>>
>>>> i haven't created (a) build(s) yet for scala 2.12...  i'll be
>>>> coordinating this w/the databricks folks next week.
>>>>
>>>> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <dongjoon.hyun@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you, Shane! :D
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>> On Fri, Sep 7, 2018 at 9:51 AM shane knapp <sknapp@berkeley.edu>
>>>>> wrote:
>>>>>
>>>>>> i'll try and get to the 2.4 branch stuff today...
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Shane Knapp
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Mime
View raw message