spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: [VOTE] Amend Spark's Semantic Versioning Policy
Date Tue, 10 Mar 2020 00:20:13 GMT
+1 (binding) on the original proposal.

On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer <heuermh@gmail.com> wrote:

> +1 (non-binding)
>
> I am disappointed however that this only mentions API and not dependencies
> and transitive dependencies.
>
I think upgrading dependencies continues to be reasonable.

>
> As Spark does not provide separation between its runtime classpath and the
> classpath used by applications, I believe Spark's dependencies and
> transitive dependencies should be considered part of the API for this
> policy.  Breaking dependency upgrades and incompatible dependency versions
> are the source of much frustration.
>
I my self have also face this frustration. I believe we've increased some
shading to help here. Are there specific pain points you've  experienced?
Maybe we can factor this discussion into another thread

>
>

>    michael
>
>
> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <ueshin@happy-camper.st> wrote:
>
> +1 (binding)
>
>
> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1987@gmail.com>
> wrote:
>
>> +1 (non-binding)
>>
>> Cheers,
>>
>> Xingbo
>>
>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lixiao@databricks.com> wrote:
>>
>>> +1 (binding)
>>>
>>> Xiao
>>>
>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g.lee@gmail.com> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls223@gmail.com>
>>>> wrote:
>>>>
>>>>> The proposal itself seems good as the factors to consider, Thanks
>>>>> Michael.
>>>>>
>>>>> Several concerns mentioned look good points, in particular:
>>>>>
>>>>> > ... assuming that this is for public stable APIs, not APIs that
are
>>>>> marked as unstable, evolving, etc. ...
>>>>> I would like to confirm this. We already have API annotations such as
>>>>> Experimental, Unstable, etc. and the implication of each is still
>>>>> effective. If it's for stable APIs, it makes sense to me as well.
>>>>>
>>>>> > ... can we expand on 'when' an API change can occur ?  Since we
are
>>>>> proposing to diverge from semver. ...
>>>>> I think this is a good point. If we're proposing to divert
>>>>> from semver, the delta compared to semver will have to be clarified to
>>>>> avoid different personal interpretations of the somewhat general principles.
>>>>>
>>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5
to
>>>>> Apache Spark 3.0+? ...
>>>>>
>>>>> Assuming these concerns will be addressed, +1 (binding).
>>>>>
>>>>>
>>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin.m.s@gmail.com>님이
>>>>> 작성:
>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>> Bests,
>>>>>> Takeshi
>>>>>>
>>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>>>>>> gengliang.wang@databricks.com> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Gengliang
>>>>>>>
>>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
>>>>>>> matei.zaharia@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1 as well.
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0fan@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> +1 (binding), assuming that this is for public stable APIs,
not
>>>>>>>> APIs that are marked as unstable, evolving, etc.
>>>>>>>>
>>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <iemejia@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> Michael's section on the trade-offs of maintaining /
removing an
>>>>>>>>> API are one of
>>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast
+1
>>>>>>>>>
>>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
>>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > This new policy has a good indention, but can we
narrow down on
>>>>>>>>> the migration from Apache Spark 2.4.5 to Apache Spark
3.0+?
>>>>>>>>> >
>>>>>>>>> > I saw that there already exists a reverting PR to
bring back
>>>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>>>>>>>> >
>>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>>>>>>>> difficulty, and it's nice.
>>>>>>>>> >
>>>>>>>>> > However, for the other cases, it sounds like `recommending
older
>>>>>>>>> APIs as much as possible` due to the following.
>>>>>>>>> >
>>>>>>>>> >      > How long has the API been in Spark?
>>>>>>>>> >
>>>>>>>>> > We had better be more careful when we add a new
policy and
>>>>>>>>> should aim not to mislead the users and 3rd party library
developers to say
>>>>>>>>> "older is better".
>>>>>>>>> >
>>>>>>>>> > Technically, I'm wondering who will use new APIs
in their
>>>>>>>>> examples (of books and StackOverflow) if they need to
write an additional
>>>>>>>>> warning like `this only works at 2.4.0+` always .
>>>>>>>>> >
>>>>>>>>> > Bests,
>>>>>>>>> > Dongjoon.
>>>>>>>>> >
>>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan
<
>>>>>>>>> mridul@gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> I am in broad agreement with the prposal, as
any developer, I
>>>>>>>>> prefer
>>>>>>>>> >> stable well designed API's :-)
>>>>>>>>> >>
>>>>>>>>> >> Can we tie the proposal to stability guarantees
given by spark
>>>>>>>>> and
>>>>>>>>> >> reasonable expectation from users ?
>>>>>>>>> >> In my opinion, an unstable or evolving could
change - while an
>>>>>>>>> >> experimental api which has been around for ages
should be more
>>>>>>>>> >> conservatively handled.
>>>>>>>>> >> Which brings in question what are the stability
guarantees as
>>>>>>>>> >> specified by annotations interacting with the
proposal.
>>>>>>>>> >>
>>>>>>>>> >> Also, can we expand on 'when' an API change
can occur ?  Since
>>>>>>>>> we are
>>>>>>>>> >> proposing to diverge from semver.
>>>>>>>>> >> Patch release ? Minor release ? Only major release
? Based on
>>>>>>>>> 'impact'
>>>>>>>>> >> of API ? Stability guarantees ?
>>>>>>>>> >>
>>>>>>>>> >> Regards,
>>>>>>>>> >> Mridul
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust
<
>>>>>>>>> michael@databricks.com> wrote:
>>>>>>>>> >> >
>>>>>>>>> >> > I'll start off the vote with a strong +1
(binding).
>>>>>>>>> >> >
>>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael
Armbrust <
>>>>>>>>> michael@databricks.com> wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >> I propose to add the following text
to Spark's Semantic
>>>>>>>>> Versioning policy and adopt it as the rubric that should
be used when
>>>>>>>>> deciding to break APIs (even at major versions such as
3.0).
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> I'll leave the vote open until Tuesday,
March 10th at 2pm.
>>>>>>>>> As this is a procedural vote, the measure will pass if
there are more
>>>>>>>>> favourable votes than unfavourable ones. PMC votes are
binding, but the
>>>>>>>>> community is encouraged to add their voice to the discussion.
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy.
>>>>>>>>> >> >>
>>>>>>>>> >> >> [ ] -1  - Spark should not adopt this
policy.
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <new policy>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> Considerations When Breaking APIs
>>>>>>>>> >> >>
>>>>>>>>> >> >> The Spark project strives to avoid
breaking APIs or silently
>>>>>>>>> changing behavior, even at major versions. While this
is not always
>>>>>>>>> possible, the balance of the following factors should
be considered before
>>>>>>>>> choosing to break an API.
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> Cost of Breaking an API
>>>>>>>>> >> >>
>>>>>>>>> >> >> Breaking an API almost always has a
non-trivial cost to the
>>>>>>>>> users of Spark. A broken API means that Spark programs
need to be rewritten
>>>>>>>>> before they can be upgraded. However, there are a few
considerations when
>>>>>>>>> thinking about what the cost will be:
>>>>>>>>> >> >>
>>>>>>>>> >> >> Usage - an API that is actively used
in many different
>>>>>>>>> places, is always very costly to break. While it is hard
to know usage for
>>>>>>>>> sure, there are a bunch of ways that we can estimate:
>>>>>>>>> >> >>
>>>>>>>>> >> >> How long has the API been in Spark?
>>>>>>>>> >> >>
>>>>>>>>> >> >> Is the API common even for basic programs?
>>>>>>>>> >> >>
>>>>>>>>> >> >> How often do we see recent questions
in JIRA or mailing
>>>>>>>>> lists?
>>>>>>>>> >> >>
>>>>>>>>> >> >> How often does it appear in StackOverflow
or blogs?
>>>>>>>>> >> >>
>>>>>>>>> >> >> Behavior after the break - How will
a program that works
>>>>>>>>> today, work after the break? The following are listed
roughly in order of
>>>>>>>>> increasing severity:
>>>>>>>>> >> >>
>>>>>>>>> >> >> Will there be a compiler or linker
error?
>>>>>>>>> >> >>
>>>>>>>>> >> >> Will there be a runtime exception?
>>>>>>>>> >> >>
>>>>>>>>> >> >> Will that exception happen after significant
processing has
>>>>>>>>> been done?
>>>>>>>>> >> >>
>>>>>>>>> >> >> Will we silently return different answers?
(very hard to
>>>>>>>>> debug, might not even notice!)
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> Cost of Maintaining an API
>>>>>>>>> >> >>
>>>>>>>>> >> >> Of course, the above does not mean
that we will never break
>>>>>>>>> any APIs. We must also consider the cost both to the
project and to our
>>>>>>>>> users of keeping the API in question.
>>>>>>>>> >> >>
>>>>>>>>> >> >> Project Costs - Every API we have needs
to be tested and
>>>>>>>>> needs to keep working as other parts of the project changes.
These costs
>>>>>>>>> are significantly exacerbated when external dependencies
change (the JVM,
>>>>>>>>> Scala, etc). In some cases, while not completely technically
infeasible,
>>>>>>>>> the cost of maintaining a particular API can become too
high.
>>>>>>>>> >> >>
>>>>>>>>> >> >> User Costs - APIs also have a cognitive
cost to users
>>>>>>>>> learning Spark or trying to understand Spark programs.
This cost becomes
>>>>>>>>> even higher when the API in question has confusing or
undefined semantics.
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> Alternatives to Breaking an API
>>>>>>>>> >> >>
>>>>>>>>> >> >> In cases where there is a "Bad API",
but where the cost of
>>>>>>>>> removal is also high, there are alternatives that should
be considered that
>>>>>>>>> do not hurt existing users but do address some of the
maintenance costs.
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> Avoid Bad APIs - While this is a bit
obvious, it is an
>>>>>>>>> important point. Anytime we are adding a new interface
to Spark we should
>>>>>>>>> consider that we might be stuck with this API forever.
Think deeply about
>>>>>>>>> how new APIs relate to existing ones, as well as how
you expect them to
>>>>>>>>> evolve over time.
>>>>>>>>> >> >>
>>>>>>>>> >> >> Deprecation Warnings - All deprecation
warnings should point
>>>>>>>>> to a clear alternative and should never just say that
an API is deprecated.
>>>>>>>>> >> >>
>>>>>>>>> >> >> Updated Docs - Documentation should
point to the "best"
>>>>>>>>> recommended way of performing a given task. In the cases
where we maintain
>>>>>>>>> legacy documentation, we should clearly point to newer
APIs and suggest to
>>>>>>>>> users the "right" way.
>>>>>>>>> >> >>
>>>>>>>>> >> >> Community Work - Many people learn
Spark by reading blogs
>>>>>>>>> and other sites such as StackOverflow. However, many
of these resources are
>>>>>>>>> out of date. Update them, to reduce the cost of eventually
removing
>>>>>>>>> deprecated APIs.
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> </new policy>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>> >>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> Takeshi Yamamuro
>>>>>>
>>>>>
>>>
>>> --
>>> <https://databricks.com/sparkaisummit/north-america>
>>>
>>
>
> --
> Takuya UESHIN
>
> http://twitter.com/ueshin
>
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Mime
View raw message