spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Cheung <felixcheun...@hotmail.com>
Subject Re: Branch 2.4 is cut
Date Mon, 10 Sep 2018 09:15:23 GMT
I’m a bit concern about what Arun is summarizing?

We are building on DSv2 and already have to rewrite for bunch of changes in master/2.4, increasing
in cost for dev work and release management.

If we are saying more changes are coming in 3.0, do we have more info on what value the current
changes in 2.4 are adding now?



________________________________
From: Wenchen Fan <cloud0fan@gmail.com>
Sent: Monday, September 10, 2018 12:35 AM
To: arunm@apache.org
Cc: Ryan Blue; sknapp@berkeley.edu; Dongjoon Hyun; joshrosen@databricks.com; Sean Owen; Spark
dev list
Subject: Re: Branch 2.4 is cut

There are a lot of "breaking" changes we made in 2.4 for data source v2, while I agree SPARK-24882
is "breaking" most.

I don't agree SPARK-24882 is half-baked. But I'm willing to revert it if we have a bunch of
data source v2 users and they are not willing to update their implementation intensely before
data source v2 API is stabilized.

On Mon, Sep 10, 2018 at 2:55 PM Arun Mahadevan <arunm@apache.org<mailto:arunm@apache.org>>
wrote:
Ryan's proposal makes a lot of sense. Its better not to release half-baked changes in 2.4
which not only breaks a lot of the APIs released in 2.3, but also expected to change further
due redesigns before 3.0 so don't see much value releasing it in 2.4.

On Sun, 9 Sep 2018 at 22:42, Wenchen Fan <cloud0fan@gmail.com<mailto:cloud0fan@gmail.com>>
wrote:
Strictly speaking, data source v2 is always half-finished until we mark it as stable. We need
some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus on the "RDD" part
of the API: the separation of reader factory and input partitions, the introduction of ScanConfig,
etc. Then we focus on the high-level abstraction and want to change the "table" part of the
API.

In my understanding, each PR should be self-contained. If we are OK to have SPARK-24882 in
master as an individual commit, I think it's also OK to have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the new abstraction.
It doesn't change the API a lot, but update the streaming execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <rblue@netflix.com<mailto:rblue@netflix.com>>
wrote:
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in master, added in
SPARK-24882<https://github.com/apache/spark/pull/22009>, and are now in the 2.4 branch.
We've had a lot of good discussion since that PR was merged to update and fix the design,
plus only one of the follow-ups on SPARK-25186<https://issues.apache.org/jira/browse/SPARK-25186>
is done. Clearly, the redesign was too large to get into 2.4 in so little time -- it was proposed
about 10 days before the original branch date -- and I don't think it is a good idea to release
half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That way we have minor
changes in 2.4 and major changes in the next release, instead of major changes in both. What
does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp <sknapp@berkeley.edu<mailto:sknapp@berkeley.edu>>
wrote:
++joshrosen  (thanks for the help w/deploying the jenkins configs)

the basic 2.4 builds are deployed and building!

i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating this w/the databricks
folks next week.

On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <dongjoon.hyun@gmail.com<mailto:dongjoon.hyun@gmail.com>>
wrote:
Thank you, Shane! :D

Bests,
Dongjoon.

On Fri, Sep 7, 2018 at 9:51 AM shane knapp <sknapp@berkeley.edu<mailto:sknapp@berkeley.edu>>
wrote:
i'll try and get to the 2.4 branch stuff today...




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Ryan Blue
Software Engineer
Netflix

Mime
View raw message