spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: Branch 2.4 is cut
Date Mon, 10 Sep 2018 05:41:46 GMT
Strictly speaking, data source v2 is always half-finished until we mark it
as stable. We need some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus
on the "RDD" part of the API: the separation of reader factory and input
partitions, the introduction of ScanConfig, etc. Then we focus on the
high-level abstraction and want to change the "table" part of the API.

In my understanding, each PR should be self-contained. If we are OK to
have SPARK-24882 in master as an individual commit, I think it's also OK to
have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the
new abstraction. It doesn't change the API a lot, but update the streaming
execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <rblue@netflix.com> wrote:

> Wenchen, can you hold off on the first RC?
>
> The half-finished changes from the redesign of the DataSourceV2 API are in
> master, added in SPARK-24882 <https://github.com/apache/spark/pull/22009>,
> and are now in the 2.4 branch. We've had a lot of good discussion since
> that PR was merged to update and fix the design, plus only one of the
> follow-ups on SPARK-25186
> <https://issues.apache.org/jira/browse/SPARK-25186> is done. Clearly, the
> redesign was too large to get into 2.4 in so little time -- it was proposed
> about 10 days before the original branch date -- and I don't think it is a
> good idea to release half-finished major changes.
>
> The easiest solution is to revert SPARK-24882 in the release branch. That
> way we have minor changes in 2.4 and major changes in the next release,
> instead of major changes in both. What does everyone think?
>
> rb
>
> On Fri, Sep 7, 2018 at 10:37 AM shane knapp <sknapp@berkeley.edu> wrote:
>
>> ++joshrosen  (thanks for the help w/deploying the jenkins configs)
>>
>> the basic 2.4 builds are deployed and building!
>>
>> i haven't created (a) build(s) yet for scala 2.12...  i'll be
>> coordinating this w/the databricks folks next week.
>>
>> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <dongjoon.hyun@gmail.com>
>> wrote:
>>
>>> Thank you, Shane! :D
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On Fri, Sep 7, 2018 at 9:51 AM shane knapp <sknapp@berkeley.edu> wrote:
>>>
>>>> i'll try and get to the 2.4 branch stuff today...
>>>>
>>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Mime
View raw message