spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saisai Shao <sai.sai.s...@gmail.com>
Subject Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
Date Mon, 17 Jun 2019 04:41:17 GMT
+1 (binding)

Thanks
Saisai

Imran Rashid <imran@therashids.com> 于2019年6月15日周六 上午3:46写道:

> +1 (binding)
>
> I think this is a really important feature for spark.
>
> First, there is already a lot of interest in alternative shuffle storage
> in the community.  There is already a lot of interest in alternative
> shuffle storage, from dynamic allocation in kubernetes, to even just
> improving stability in standard on-premise use of Spark.  However, they're
> often stuck doing this in forks of Spark, and in ways that are not
> maintainable (because they copy-paste many spark internals) or are
> incorrect (for not correctly handling speculative execution & stage
> retries).
>
> Second, I think the specific proposal is good for finding the right
> balance between flexibility and too much complexity, to allow incremental
> improvements.  A lot of work has been put into this already to try to
> figure out which pieces are essential to make alternative shuffle storage
> implementations feasible.
>
> Of course, that means it doesn't include everything imaginable; some
> things still aren't supported, and some will still choose to use the older
> ShuffleManager api to give total control over all of shuffle.  But we know
> there are a reasonable set of things which can be implemented behind the
> api as the first step, and it can continue to evolve.
>
> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <if56@cornell.edu> wrote:
>
>> +1 (non-binding). This API is versatile and flexible enough to handle
>> Bloomberg's internal use-cases. The ability for us to vary implementation
>> strategies is quite appealing. It is also worth to note the minimal changes
>> to Spark core in order to make it work. This is a very much needed addition
>> within the Spark shuffle story.
>>
>> On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyangbo@gmail.com> wrote:
>>
>>> +1 This is great work, allowing plugin of different sort shuffle
>>> write/read implementation! Also great to see it retain the current Spark
>>> configuration
>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>>
>>>
>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mcheah@palantir.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>>
>>>>
>>>> I would like to call a vote for the SPIP for SPARK-25299
>>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes
to
>>>> introduce a pluggable storage API for temporary shuffle data.
>>>>
>>>>
>>>>
>>>> You may find the SPIP document here
>>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit>
>>>> .
>>>>
>>>>
>>>>
>>>> The discussion thread for the SPIP was conducted here
>>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E>
>>>> .
>>>>
>>>>
>>>>
>>>> Please vote on whether or not this proposal is agreeable to you.
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> -Matt Cheah
>>>>
>>>

Mime
View raw message