spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <lix...@databricks.com>
Subject Re: [VOTE] Amend Spark's Semantic Versioning Policy
Date Mon, 09 Mar 2020 16:35:01 GMT
+1 (binding)

Xiao

On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g.lee@gmail.com> wrote:

> +1 (non-binding)
>
> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>
>> The proposal itself seems good as the factors to consider, Thanks Michael.
>>
>> Several concerns mentioned look good points, in particular:
>>
>> > ... assuming that this is for public stable APIs, not APIs that are
>> marked as unstable, evolving, etc. ...
>> I would like to confirm this. We already have API annotations such as
>> Experimental, Unstable, etc. and the implication of each is still
>> effective. If it's for stable APIs, it makes sense to me as well.
>>
>> > ... can we expand on 'when' an API change can occur ?  Since we are
>> proposing to diverge from semver. ...
>> I think this is a good point. If we're proposing to divert from semver,
>> the delta compared to semver will have to be clarified to avoid different
>> personal interpretations of the somewhat general principles.
>>
>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>> Apache Spark 3.0+? ...
>>
>> Assuming these concerns will be addressed, +1 (binding).
>>
>>
>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin.m.s@gmail.com>님이
작성:
>>
>>> +1 (non-binding)
>>>
>>> Bests,
>>> Takeshi
>>>
>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>>> gengliang.wang@databricks.com> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Gengliang
>>>>
>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaharia@gmail.com>
>>>> wrote:
>>>>
>>>>> +1 as well.
>>>>>
>>>>> Matei
>>>>>
>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0fan@gmail.com>
wrote:
>>>>>
>>>>> +1 (binding), assuming that this is for public stable APIs, not APIs
>>>>> that are marked as unstable, evolving, etc.
>>>>>
>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <iemejia@gmail.com>
wrote:
>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>> Michael's section on the trade-offs of maintaining / removing an
API
>>>>>> are one of
>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>>>>
>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > This new policy has a good indention, but can we narrow down
on the
>>>>>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>>>>> >
>>>>>> > I saw that there already exists a reverting PR to bring back
Spark
>>>>>> 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>>>>> >
>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>>>>> difficulty, and it's nice.
>>>>>> >
>>>>>> > However, for the other cases, it sounds like `recommending older
>>>>>> APIs as much as possible` due to the following.
>>>>>> >
>>>>>> >      > How long has the API been in Spark?
>>>>>> >
>>>>>> > We had better be more careful when we add a new policy and should
>>>>>> aim not to mislead the users and 3rd party library developers to
say "older
>>>>>> is better".
>>>>>> >
>>>>>> > Technically, I'm wondering who will use new APIs in their examples
>>>>>> (of books and StackOverflow) if they need to write an additional
warning
>>>>>> like `this only works at 2.4.0+` always .
>>>>>> >
>>>>>> > Bests,
>>>>>> > Dongjoon.
>>>>>> >
>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
>>>>>> mridul@gmail.com> wrote:
>>>>>> >>
>>>>>> >> I am in broad agreement with the prposal, as any developer,
I
>>>>>> prefer
>>>>>> >> stable well designed API's :-)
>>>>>> >>
>>>>>> >> Can we tie the proposal to stability guarantees given by
spark and
>>>>>> >> reasonable expectation from users ?
>>>>>> >> In my opinion, an unstable or evolving could change - while
an
>>>>>> >> experimental api which has been around for ages should be
more
>>>>>> >> conservatively handled.
>>>>>> >> Which brings in question what are the stability guarantees
as
>>>>>> >> specified by annotations interacting with the proposal.
>>>>>> >>
>>>>>> >> Also, can we expand on 'when' an API change can occur ?
 Since we
>>>>>> are
>>>>>> >> proposing to diverge from semver.
>>>>>> >> Patch release ? Minor release ? Only major release ? Based
on
>>>>>> 'impact'
>>>>>> >> of API ? Stability guarantees ?
>>>>>> >>
>>>>>> >> Regards,
>>>>>> >> Mridul
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>>>>> michael@databricks.com> wrote:
>>>>>> >> >
>>>>>> >> > I'll start off the vote with a strong +1 (binding).
>>>>>> >> >
>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>>>>> michael@databricks.com> wrote:
>>>>>> >> >>
>>>>>> >> >> I propose to add the following text to Spark's
Semantic
>>>>>> Versioning policy and adopt it as the rubric that should be used
when
>>>>>> deciding to break APIs (even at major versions such as 3.0).
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th
at 2pm. As
>>>>>> this is a procedural vote, the measure will pass if there are more
>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but
the
>>>>>> community is encouraged to add their voice to the discussion.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> [ ] +1 - Spark should adopt this policy.
>>>>>> >> >>
>>>>>> >> >> [ ] -1  - Spark should not adopt this policy.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> <new policy>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Considerations When Breaking APIs
>>>>>> >> >>
>>>>>> >> >> The Spark project strives to avoid breaking APIs
or silently
>>>>>> changing behavior, even at major versions. While this is not always
>>>>>> possible, the balance of the following factors should be considered
before
>>>>>> choosing to break an API.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Cost of Breaking an API
>>>>>> >> >>
>>>>>> >> >> Breaking an API almost always has a non-trivial
cost to the
>>>>>> users of Spark. A broken API means that Spark programs need to be
rewritten
>>>>>> before they can be upgraded. However, there are a few considerations
when
>>>>>> thinking about what the cost will be:
>>>>>> >> >>
>>>>>> >> >> Usage - an API that is actively used in many different
places,
>>>>>> is always very costly to break. While it is hard to know usage for
sure,
>>>>>> there are a bunch of ways that we can estimate:
>>>>>> >> >>
>>>>>> >> >> How long has the API been in Spark?
>>>>>> >> >>
>>>>>> >> >> Is the API common even for basic programs?
>>>>>> >> >>
>>>>>> >> >> How often do we see recent questions in JIRA or
mailing lists?
>>>>>> >> >>
>>>>>> >> >> How often does it appear in StackOverflow or blogs?
>>>>>> >> >>
>>>>>> >> >> Behavior after the break - How will a program that
works today,
>>>>>> work after the break? The following are listed roughly in order of
>>>>>> increasing severity:
>>>>>> >> >>
>>>>>> >> >> Will there be a compiler or linker error?
>>>>>> >> >>
>>>>>> >> >> Will there be a runtime exception?
>>>>>> >> >>
>>>>>> >> >> Will that exception happen after significant processing
has
>>>>>> been done?
>>>>>> >> >>
>>>>>> >> >> Will we silently return different answers? (very
hard to debug,
>>>>>> might not even notice!)
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Cost of Maintaining an API
>>>>>> >> >>
>>>>>> >> >> Of course, the above does not mean that we will
never break any
>>>>>> APIs. We must also consider the cost both to the project and to our
users
>>>>>> of keeping the API in question.
>>>>>> >> >>
>>>>>> >> >> Project Costs - Every API we have needs to be tested
and needs
>>>>>> to keep working as other parts of the project changes. These costs
are
>>>>>> significantly exacerbated when external dependencies change (the
JVM,
>>>>>> Scala, etc). In some cases, while not completely technically infeasible,
>>>>>> the cost of maintaining a particular API can become too high.
>>>>>> >> >>
>>>>>> >> >> User Costs - APIs also have a cognitive cost to
users learning
>>>>>> Spark or trying to understand Spark programs. This cost becomes even
higher
>>>>>> when the API in question has confusing or undefined semantics.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Alternatives to Breaking an API
>>>>>> >> >>
>>>>>> >> >> In cases where there is a "Bad API", but where
the cost of
>>>>>> removal is also high, there are alternatives that should be considered
that
>>>>>> do not hurt existing users but do address some of the maintenance
costs.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it
is an
>>>>>> important point. Anytime we are adding a new interface to Spark we
should
>>>>>> consider that we might be stuck with this API forever. Think deeply
about
>>>>>> how new APIs relate to existing ones, as well as how you expect them
to
>>>>>> evolve over time.
>>>>>> >> >>
>>>>>> >> >> Deprecation Warnings - All deprecation warnings
should point to
>>>>>> a clear alternative and should never just say that an API is deprecated.
>>>>>> >> >>
>>>>>> >> >> Updated Docs - Documentation should point to the
"best"
>>>>>> recommended way of performing a given task. In the cases where we
maintain
>>>>>> legacy documentation, we should clearly point to newer APIs and suggest
to
>>>>>> users the "right" way.
>>>>>> >> >>
>>>>>> >> >> Community Work - Many people learn Spark by reading
blogs and
>>>>>> other sites such as StackOverflow. However, many of these resources
are out
>>>>>> of date. Update them, to reduce the cost of eventually removing deprecated
>>>>>> APIs.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> </new policy>
>>>>>> >>
>>>>>> >>
>>>>>> ---------------------------------------------------------------------
>>>>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>> >>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>

-- 
<https://databricks.com/sparkaisummit/north-america>

Mime
View raw message