spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin" <r...@databricks.com>
Subject Re: Add spark dependency on on org.opencypher:okapi-shade.okapi
Date Wed, 16 Oct 2019 17:41:28 GMT
Just curious - did we discuss why this shouldn't be another Apache sister project?

On Wed, Oct 16, 2019 at 10:21 AM, Sean Owen < srowen@gmail.com > wrote:

> 
> 
> 
> We don't all have to agree on whether to add this -- there are like 10
> people with an opinion -- and I certainly would not veto it. In practice a
> medium-sized changes needs someone to review/merge it all the way through,
> and nobody strongly objecting. I too don't know what to make of the
> situation; what happened to the supporters here?
> 
> 
> 
> I am concerned about maintenance, as inevitably any new module falls on
> everyone to maintain to some degree, and people come and go despite their
> intentions. But that isn't the substance of why I personally wouldn't
> merge it. Just doesn't seem like it must live in Spark. But again this is
> my opinion; you don't need to convince me, just need to
> (re?)-convince a shepherd, sponsor for this change.
> 
> 
> 
> Voting on the dependency part or whatever is also not important. It's a
> detail, and already merged even.
> 
> 
> 
> The issue to hand is: if nobody supports reviewing and merging the rest of
> the change, what then? we can't leave it half implemented. The fallback
> plan is just to back it out and reconsider later. This would be a poor
> outcome process-wise, but better than leaving it incomplete.
> 
> 
> 
> On Wed, Oct 16, 2019 at 3:15 AM Martin Junghanns
> < martin. junghanns@ neo4j. com ( martin.junghanns@neo4j.com ) > wrote:
> 
> 
>> 
>> 
>> I'm slightly confused about this discussion. I worked on all of the
>> aforementioned PRs: the module PR that has been merged, the current PR
>> that introduces the Graph API and the PoC PR that contains the full
>> implementation. The issues around shading were addressed and the module PR
>> eventually got merged. Two PMC members including the SPIP shepherd are
>> working with me (and others) on the current API PR. The SPIP to bring
>> Spark Graph into Apache Spark itself has been successfully voted on
>> earlier this year. I presented this work at Spark Summit in San Fransisco
>> in May and was asked by the organizers to present the topic at the
>> European Spark Summit. I'm currently sitting in the speakers room of that
>> conference preparing for the talk and reading this thread. I hope you
>> understand my confusion.
>> 
>> 
>> 
>> I admit - and Xiangrui pointed this out in the other thread, too - that we
>> made the early mistake of not bringing more Spark committers on board
>> which lead to a stagnation period during summer when Xiangrui wasn't
>> around to help review and bring progress to the project.
>> 
>> 
>> 
>> Sean, if your concern is the lack of maintainers of that module, I
>> personally would like to volunteer to maintain Spark Graph. I'm also a
>> contributor to the Okapi stack and am able to work on whatever issue might
>> come up on that end including updating dependencies etc. FWIW, Okapi is
>> actively maintained by a team at Neo4j.
>> 
>> 
>> 
>> Best, Martin
>> 
>> 
>> 
>> On Wed, 16 Oct 2019, 4:35 AM Sean Owen < srowen@ gmail. com (
>> srowen@gmail.com ) > wrote:
>> 
>> 
>>> 
>>> 
>>> I do not have a very informed opinion here, so take this with a grain of
>>> salt.
>>> 
>>> 
>>> 
>>> I'd say that we need to either commit a coherent version of this for Spark
>>> 3, or not at all. If it doesn't have support, I'd back out the existing
>>> changes.
>>> I was initially skeptical about how much this needs to be in Spark vs a
>>> third-party package, and that still stands.
>>> 
>>> 
>>> 
>>> The addition of another dependency isn't that big a deal IMHO, but, yes,
>>> it does add something to the maintenance overhead. But that's all the more
>>> true of a new module.
>>> 
>>> 
>>> 
>>> I don't feel strongly about it, but if this isn't obviously getting
>>> support from any committers, can we keep it as a third party library for
>>> now?
>>> 
>>> 
>>> 
>>> On Tue, Oct 15, 2019 at 8:53 PM Weichen Xu < weichen. xu@ databricks. com
(
>>> weichen.xu@databricks.com ) > wrote:
>>> 
>>> 
>>>> 
>>>> 
>>>> Hi Mats Rydberg,
>>>> 
>>>> 
>>>> 
>>>> Although this dependency "org.opencypher:okapi-shade.okapi" was added into
>>>> spark, but Xiangrui raised two concerns (see above mail) about it, so we'd
>>>> better rethink on this and consider whether this is a good choice, so I
>>>> call this vote.
>>>> 
>>>> 
>>>> 
>>>> Thanks!
>>>> 
>>>> 
>>>> 
>>>> On Tue, Oct 15, 2019 at 10:56 PM Mats Rydberg < mats@ neo4j. org. invalid
(
>>>> mats@neo4j.org.invalid ) > wrote:
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Hello Weichen, community
>>>>> 
>>>>> 
>>>>> 
>>>>> I'm sorry, I'm feeling a little bit confused about this vote. Is this
>>>>> about the PR ( https:/ / github. com/ apache/ spark/ pull/ 24490 (
>>>>> https://github.com/apache/spark/pull/24490 ) ) that was merged in early
>>>>> June and introduced the spark-graph module including the okapi-shade
>>>>> dependency?
>>>>> 
>>>>> 
>>>>> 
>>>>> Regarding the okapi-shade dependency which was developed as part of the
>>>>> above PR work, some advice was offered by Scala experts at TripleQuote
>>>>> which helped find a satisfactory solution. The shading mechanism used
is
>>>>> standard and very comparable to a Java library shading solution.
>>>>> 
>>>>> 
>>>>> 
>>>>> The PR you link ( https:/ / github. com/ apache/ spark/ pull/ 24297 (
>>>>> https://github.com/apache/spark/pull/24297 ) ) is not meant for merging.
It
>>>>> is just a proof-of-concept branch containing a full implementation of
the
>>>>> system, which is kept up-to-date with the API discussion on the currently
>>>>> proposed PR: https:/ / github. com/ apache/ spark/ pull/ 24851 (
>>>>> https://github.com/apache/spark/pull/24851 ).
>>>>> 
>>>>> 
>>>>> 
>>>>> Thank you
>>>>> Mats
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu < weichen. xu@ databricks.
com
>>>>> ( weichen.xu@databricks.com ) > wrote:
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I'd like to call a new vote on the issue: should we add dependency
>>>>>> "org.opencypher:okapi-shade.okapi" into spark ? The issue background
is:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Spark is going to add a big feature "Spark Graph", the prototypical
>>>>>> implementation is here https:/ / github. com/ apache/ spark/ pull/
24297 (
>>>>>> https://github.com/apache/spark/pull/24297 )
>>>>>> which will introduce dependency org.opencypher:okapi-shade.okapi
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Xiangrui already mentioned 2 concerns on this dependency change:
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On the technical side, my main concern is the runtime dependency
on
>>>>>>> org.opencypher:okapi-shade.okapi depends on several Scala libraries.
We
>>>>>>> came out with the solution to shade a few Scala libraries to
avoid
>>>>>>> pollution. However, I'm not super confident that the approach
is
>>>>>>> sustainable for two reasons: a) there exists no proper shading
libraries
>>>>>>> for Scala, 2) We will have to wait for upgrades from those Scala
libraries
>>>>>>> before we can upgrade Spark to use a newer Scala version. So
it would be
>>>>>>> great if some Scala experts can help review the current implementation
and
>>>>>>> help assess the risk.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> So let's discuss and vote whether this is a good choice. Before this
spark
>>>>>> graph feature to get into spark ASAP, this issue should be resolved
first.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> This vote is open until next Tuseday (Oct. 22).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [ ] +1: Accept the proposal
>>>>>> [ ] +0
>>>>>> [ ] -1: I don't think this is a good idea because ...
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thank you!
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Weichen
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --------------------------------------------------------------------- To
>>> unsubscribe e-mail: dev-unsubscribe@ spark. apache. org (
>>> dev-unsubscribe@spark.apache.org )
>>> 
>>> 
>> 
>> 
> 
> 
> 
> --------------------------------------------------------------------- To
> unsubscribe e-mail: dev-unsubscribe@ spark. apache. org (
> dev-unsubscribe@spark.apache.org )
> 
> 
>
Mime
View raw message