spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denny Lee <denny.g....@gmail.com>
Subject Re: [VOTE] Amend Spark's Semantic Versioning Policy
Date Mon, 09 Mar 2020 15:33:32 GMT
+1 (non-binding)

On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:

> The proposal itself seems good as the factors to consider, Thanks Michael.
>
> Several concerns mentioned look good points, in particular:
>
> > ... assuming that this is for public stable APIs, not APIs that are
> marked as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as
> Experimental, Unstable, etc. and the implication of each is still
> effective. If it's for stable APIs, it makes sense to me as well.
>
> > ... can we expand on 'when' an API change can occur ?  Since we are
> proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert from semver,
> the delta compared to semver will have to be clarified to avoid different
> personal interpretations of the somewhat general principles.
>
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
> Apache Spark 3.0+? ...
>
> Assuming these concerns will be addressed, +1 (binding).
>
>
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin.m.s@gmail.com>님이
작성:
>
>> +1 (non-binding)
>>
>> Bests,
>> Takeshi
>>
>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>> gengliang.wang@databricks.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Gengliang
>>>
>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaharia@gmail.com>
>>> wrote:
>>>
>>>> +1 as well.
>>>>
>>>> Matei
>>>>
>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0fan@gmail.com> wrote:
>>>>
>>>> +1 (binding), assuming that this is for public stable APIs, not APIs
>>>> that are marked as unstable, evolving, etc.
>>>>
>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <iemejia@gmail.com> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Michael's section on the trade-offs of maintaining / removing an API
>>>>> are one of
>>>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>>>
>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > This new policy has a good indention, but can we narrow down on
the
>>>>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>>>> >
>>>>> > I saw that there already exists a reverting PR to bring back Spark
>>>>> 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>>>> >
>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>>>> difficulty, and it's nice.
>>>>> >
>>>>> > However, for the other cases, it sounds like `recommending older
>>>>> APIs as much as possible` due to the following.
>>>>> >
>>>>> >      > How long has the API been in Spark?
>>>>> >
>>>>> > We had better be more careful when we add a new policy and should
>>>>> aim not to mislead the users and 3rd party library developers to say
"older
>>>>> is better".
>>>>> >
>>>>> > Technically, I'm wondering who will use new APIs in their examples
>>>>> (of books and StackOverflow) if they need to write an additional warning
>>>>> like `this only works at 2.4.0+` always .
>>>>> >
>>>>> > Bests,
>>>>> > Dongjoon.
>>>>> >
>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mridul@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> I am in broad agreement with the prposal, as any developer,
I prefer
>>>>> >> stable well designed API's :-)
>>>>> >>
>>>>> >> Can we tie the proposal to stability guarantees given by spark
and
>>>>> >> reasonable expectation from users ?
>>>>> >> In my opinion, an unstable or evolving could change - while
an
>>>>> >> experimental api which has been around for ages should be more
>>>>> >> conservatively handled.
>>>>> >> Which brings in question what are the stability guarantees as
>>>>> >> specified by annotations interacting with the proposal.
>>>>> >>
>>>>> >> Also, can we expand on 'when' an API change can occur ?  Since
we
>>>>> are
>>>>> >> proposing to diverge from semver.
>>>>> >> Patch release ? Minor release ? Only major release ? Based on
>>>>> 'impact'
>>>>> >> of API ? Stability guarantees ?
>>>>> >>
>>>>> >> Regards,
>>>>> >> Mridul
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>>>> michael@databricks.com> wrote:
>>>>> >> >
>>>>> >> > I'll start off the vote with a strong +1 (binding).
>>>>> >> >
>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>>>> michael@databricks.com> wrote:
>>>>> >> >>
>>>>> >> >> I propose to add the following text to Spark's Semantic
>>>>> Versioning policy and adopt it as the rubric that should be used when
>>>>> deciding to break APIs (even at major versions such as 3.0).
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> I'll leave the vote open until Tuesday, March 10th
at 2pm. As
>>>>> this is a procedural vote, the measure will pass if there are more
>>>>> favourable votes than unfavourable ones. PMC votes are binding, but the
>>>>> community is encouraged to add their voice to the discussion.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> [ ] +1 - Spark should adopt this policy.
>>>>> >> >>
>>>>> >> >> [ ] -1  - Spark should not adopt this policy.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> <new policy>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Considerations When Breaking APIs
>>>>> >> >>
>>>>> >> >> The Spark project strives to avoid breaking APIs or
silently
>>>>> changing behavior, even at major versions. While this is not always
>>>>> possible, the balance of the following factors should be considered before
>>>>> choosing to break an API.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Cost of Breaking an API
>>>>> >> >>
>>>>> >> >> Breaking an API almost always has a non-trivial cost
to the
>>>>> users of Spark. A broken API means that Spark programs need to be rewritten
>>>>> before they can be upgraded. However, there are a few considerations
when
>>>>> thinking about what the cost will be:
>>>>> >> >>
>>>>> >> >> Usage - an API that is actively used in many different
places,
>>>>> is always very costly to break. While it is hard to know usage for sure,
>>>>> there are a bunch of ways that we can estimate:
>>>>> >> >>
>>>>> >> >> How long has the API been in Spark?
>>>>> >> >>
>>>>> >> >> Is the API common even for basic programs?
>>>>> >> >>
>>>>> >> >> How often do we see recent questions in JIRA or mailing
lists?
>>>>> >> >>
>>>>> >> >> How often does it appear in StackOverflow or blogs?
>>>>> >> >>
>>>>> >> >> Behavior after the break - How will a program that
works today,
>>>>> work after the break? The following are listed roughly in order of
>>>>> increasing severity:
>>>>> >> >>
>>>>> >> >> Will there be a compiler or linker error?
>>>>> >> >>
>>>>> >> >> Will there be a runtime exception?
>>>>> >> >>
>>>>> >> >> Will that exception happen after significant processing
has been
>>>>> done?
>>>>> >> >>
>>>>> >> >> Will we silently return different answers? (very hard
to debug,
>>>>> might not even notice!)
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Cost of Maintaining an API
>>>>> >> >>
>>>>> >> >> Of course, the above does not mean that we will never
break any
>>>>> APIs. We must also consider the cost both to the project and to our users
>>>>> of keeping the API in question.
>>>>> >> >>
>>>>> >> >> Project Costs - Every API we have needs to be tested
and needs
>>>>> to keep working as other parts of the project changes. These costs are
>>>>> significantly exacerbated when external dependencies change (the JVM,
>>>>> Scala, etc). In some cases, while not completely technically infeasible,
>>>>> the cost of maintaining a particular API can become too high.
>>>>> >> >>
>>>>> >> >> User Costs - APIs also have a cognitive cost to users
learning
>>>>> Spark or trying to understand Spark programs. This cost becomes even
higher
>>>>> when the API in question has confusing or undefined semantics.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Alternatives to Breaking an API
>>>>> >> >>
>>>>> >> >> In cases where there is a "Bad API", but where the
cost of
>>>>> removal is also high, there are alternatives that should be considered
that
>>>>> do not hurt existing users but do address some of the maintenance costs.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is
an important
>>>>> point. Anytime we are adding a new interface to Spark we should consider
>>>>> that we might be stuck with this API forever. Think deeply about how
new
>>>>> APIs relate to existing ones, as well as how you expect them to evolve
over
>>>>> time.
>>>>> >> >>
>>>>> >> >> Deprecation Warnings - All deprecation warnings should
point to
>>>>> a clear alternative and should never just say that an API is deprecated.
>>>>> >> >>
>>>>> >> >> Updated Docs - Documentation should point to the "best"
>>>>> recommended way of performing a given task. In the cases where we maintain
>>>>> legacy documentation, we should clearly point to newer APIs and suggest
to
>>>>> users the "right" way.
>>>>> >> >>
>>>>> >> >> Community Work - Many people learn Spark by reading
blogs and
>>>>> other sites such as StackOverflow. However, many of these resources are
out
>>>>> of date. Update them, to reduce the cost of eventually removing deprecated
>>>>> APIs.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> </new policy>
>>>>> >>
>>>>> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>> >>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>>
>>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>

Mime
View raw message