spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Heuer <heue...@gmail.com>
Subject Re: [VOTE] Amend Spark's Semantic Versioning Policy
Date Mon, 09 Mar 2020 20:31:54 GMT
+1 (non-binding)

I am disappointed however that this only mentions API and not dependencies and transitive
dependencies.

As Spark does not provide separation between its runtime classpath and the classpath used
by applications, I believe Spark's dependencies and transitive dependencies should be considered
part of the API for this policy.  Breaking dependency upgrades and incompatible dependency
versions are the source of much frustration.

   michael


> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <ueshin@happy-camper.st> wrote:
> 
> +1 (binding)
> 
> 
> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1987@gmail.com <mailto:jiangxb1987@gmail.com>>
wrote:
> +1 (non-binding)
> 
> Cheers,
> 
> Xingbo
> 
> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lixiao@databricks.com <mailto:lixiao@databricks.com>>
wrote:
> +1 (binding)
> 
> Xiao
> 
> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g.lee@gmail.com <mailto:denny.g.lee@gmail.com>>
wrote:
> +1 (non-binding)
> 
> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls223@gmail.com <mailto:gurwls223@gmail.com>>
wrote:
> The proposal itself seems good as the factors to consider, Thanks Michael.
> 
> Several concerns mentioned look good points, in particular:
> 
> > ... assuming that this is for public stable APIs, not APIs that are marked as unstable,
evolving, etc. ...
> I would like to confirm this. We already have API annotations such as Experimental, Unstable,
etc. and the implication of each is still effective. If it's for stable APIs, it makes sense
to me as well.
> 
> > ... can we expand on 'when' an API change can occur ?  Since we are proposing to
diverge from semver. ...
> I think this is a good point. If we're proposing to divert from semver, the delta compared
to semver will have to be clarified to avoid different personal interpretations of the somewhat
general principles.
> 
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to Apache Spark
3.0+? ...
> 
> Assuming these concerns will be addressed, +1 (binding).
> 
>  
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin.m.s@gmail.com <mailto:linguin.m.s@gmail.com>>님이
작성:
> +1 (non-binding)
> 
> Bests,
> Takeshi
> 
> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <gengliang.wang@databricks.com <mailto:gengliang.wang@databricks.com>>
wrote:
> +1 (non-binding)
> 
> Gengliang
> 
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaharia@gmail.com <mailto:matei.zaharia@gmail.com>>
wrote:
> +1 as well.
> 
> Matei
> 
>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0fan@gmail.com <mailto:cloud0fan@gmail.com>>
wrote:
>> 
>> +1 (binding), assuming that this is for public stable APIs, not APIs that are marked
as unstable, evolving, etc.
>> 
>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <iemejia@gmail.com <mailto:iemejia@gmail.com>>
wrote:
>> +1 (non-binding)
>> 
>> Michael's section on the trade-offs of maintaining / removing an API are one of
>> the best reads I have seeing in this mailing list. Enthusiast +1
>> 
>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.hyun@gmail.com <mailto:dongjoon.hyun@gmail.com>>
wrote:
>> >
>> > This new policy has a good indention, but can we narrow down on the migration
from Apache Spark 2.4.5 to Apache Spark 3.0+?
>> >
>> > I saw that there already exists a reverting PR to bring back Spark 1.4 and 1.5
APIs based on this AS-IS suggestion.
>> >
>> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, and
it's nice.
>> >
>> > However, for the other cases, it sounds like `recommending older APIs as much
as possible` due to the following.
>> >
>> >      > How long has the API been in Spark?
>> >
>> > We had better be more careful when we add a new policy and should aim not to
mislead the users and 3rd party library developers to say "older is better".
>> >
>> > Technically, I'm wondering who will use new APIs in their examples (of books
and StackOverflow) if they need to write an additional warning like `this only works at 2.4.0+`
always .
>> >
>> > Bests,
>> > Dongjoon.
>> >
>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mridul@gmail.com <mailto:mridul@gmail.com>>
wrote:
>> >>
>> >> I am in broad agreement with the prposal, as any developer, I prefer
>> >> stable well designed API's :-)
>> >>
>> >> Can we tie the proposal to stability guarantees given by spark and
>> >> reasonable expectation from users ?
>> >> In my opinion, an unstable or evolving could change - while an
>> >> experimental api which has been around for ages should be more
>> >> conservatively handled.
>> >> Which brings in question what are the stability guarantees as
>> >> specified by annotations interacting with the proposal.
>> >>
>> >> Also, can we expand on 'when' an API change can occur ?  Since we are
>> >> proposing to diverge from semver.
>> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
>> >> of API ? Stability guarantees ?
>> >>
>> >> Regards,
>> >> Mridul
>> >>
>> >>
>> >>
>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <michael@databricks.com
<mailto:michael@databricks.com>> wrote:
>> >> >
>> >> > I'll start off the vote with a strong +1 (binding).
>> >> >
>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <michael@databricks.com
<mailto:michael@databricks.com>> wrote:
>> >> >>
>> >> >> I propose to add the following text to Spark's Semantic Versioning
policy and adopt it as the rubric that should be used when deciding to break APIs (even at
major versions such as 3.0).
>> >> >>
>> >> >>
>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this
is a procedural vote, the measure will pass if there are more favourable votes than unfavourable
ones. PMC votes are binding, but the community is encouraged to add their voice to the discussion.
>> >> >>
>> >> >>
>> >> >> [ ] +1 - Spark should adopt this policy.
>> >> >>
>> >> >> [ ] -1  - Spark should not adopt this policy.
>> >> >>
>> >> >>
>> >> >> <new policy>
>> >> >>
>> >> >>
>> >> >> Considerations When Breaking APIs
>> >> >>
>> >> >> The Spark project strives to avoid breaking APIs or silently changing
behavior, even at major versions. While this is not always possible, the balance of the following
factors should be considered before choosing to break an API.
>> >> >>
>> >> >>
>> >> >> Cost of Breaking an API
>> >> >>
>> >> >> Breaking an API almost always has a non-trivial cost to the users
of Spark. A broken API means that Spark programs need to be rewritten before they can be upgraded.
However, there are a few considerations when thinking about what the cost will be:
>> >> >>
>> >> >> Usage - an API that is actively used in many different places,
is always very costly to break. While it is hard to know usage for sure, there are a bunch
of ways that we can estimate:
>> >> >>
>> >> >> How long has the API been in Spark?
>> >> >>
>> >> >> Is the API common even for basic programs?
>> >> >>
>> >> >> How often do we see recent questions in JIRA or mailing lists?
>> >> >>
>> >> >> How often does it appear in StackOverflow or blogs?
>> >> >>
>> >> >> Behavior after the break - How will a program that works today,
work after the break? The following are listed roughly in order of increasing severity:
>> >> >>
>> >> >> Will there be a compiler or linker error?
>> >> >>
>> >> >> Will there be a runtime exception?
>> >> >>
>> >> >> Will that exception happen after significant processing has been
done?
>> >> >>
>> >> >> Will we silently return different answers? (very hard to debug,
might not even notice!)
>> >> >>
>> >> >>
>> >> >> Cost of Maintaining an API
>> >> >>
>> >> >> Of course, the above does not mean that we will never break any
APIs. We must also consider the cost both to the project and to our users of keeping the API
in question.
>> >> >>
>> >> >> Project Costs - Every API we have needs to be tested and needs
to keep working as other parts of the project changes. These costs are significantly exacerbated
when external dependencies change (the JVM, Scala, etc). In some cases, while not completely
technically infeasible, the cost of maintaining a particular API can become too high.
>> >> >>
>> >> >> User Costs - APIs also have a cognitive cost to users learning
Spark or trying to understand Spark programs. This cost becomes even higher when the API in
question has confusing or undefined semantics.
>> >> >>
>> >> >>
>> >> >> Alternatives to Breaking an API
>> >> >>
>> >> >> In cases where there is a "Bad API", but where the cost of removal
is also high, there are alternatives that should be considered that do not hurt existing users
but do address some of the maintenance costs.
>> >> >>
>> >> >>
>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important
point. Anytime we are adding a new interface to Spark we should consider that we might be
stuck with this API forever. Think deeply about how new APIs relate to existing ones, as well
as how you expect them to evolve over time.
>> >> >>
>> >> >> Deprecation Warnings - All deprecation warnings should point to
a clear alternative and should never just say that an API is deprecated.
>> >> >>
>> >> >> Updated Docs - Documentation should point to the "best" recommended
way of performing a given task. In the cases where we maintain legacy documentation, we should
clearly point to newer APIs and suggest to users the "right" way.
>> >> >>
>> >> >> Community Work - Many people learn Spark by reading blogs and other
sites such as StackOverflow. However, many of these resources are out of date. Update them,
to reduce the cost of eventually removing deprecated APIs.
>> >> >>
>> >> >>
>> >> >> </new policy>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <mailto:dev-unsubscribe@spark.apache.org>
>> >>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org <mailto:dev-unsubscribe@spark.apache.org>
>> 
> 
> 
> 
> -- 
> ---
> Takeshi Yamamuro
> 
> 
> -- 
>  <https://databricks.com/sparkaisummit/north-america>
> 
> -- 
> Takuya UESHIN
> 
> http://twitter.com/ueshin <http://twitter.com/ueshin>

Mime
View raw message