spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
Date Mon, 17 Jun 2019 06:10:48 GMT
+1

Bests,
Dongjoon.


On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <sai.sai.shao@gmail.com> wrote:

> +1 (binding)
>
> Thanks
> Saisai
>
> Imran Rashid <imran@therashids.com> 于2019年6月15日周六 上午3:46写道:
>
>> +1 (binding)
>>
>> I think this is a really important feature for spark.
>>
>> First, there is already a lot of interest in alternative shuffle storage
>> in the community.  There is already a lot of interest in alternative
>> shuffle storage, from dynamic allocation in kubernetes, to even just
>> improving stability in standard on-premise use of Spark.  However, they're
>> often stuck doing this in forks of Spark, and in ways that are not
>> maintainable (because they copy-paste many spark internals) or are
>> incorrect (for not correctly handling speculative execution & stage
>> retries).
>>
>> Second, I think the specific proposal is good for finding the right
>> balance between flexibility and too much complexity, to allow incremental
>> improvements.  A lot of work has been put into this already to try to
>> figure out which pieces are essential to make alternative shuffle storage
>> implementations feasible.
>>
>> Of course, that means it doesn't include everything imaginable; some
>> things still aren't supported, and some will still choose to use the older
>> ShuffleManager api to give total control over all of shuffle.  But we know
>> there are a reasonable set of things which can be implemented behind the
>> api as the first step, and it can continue to evolve.
>>
>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <if56@cornell.edu> wrote:
>>
>>> +1 (non-binding). This API is versatile and flexible enough to handle
>>> Bloomberg's internal use-cases. The ability for us to vary implementation
>>> strategies is quite appealing. It is also worth to note the minimal changes
>>> to Spark core in order to make it work. This is a very much needed addition
>>> within the Spark shuffle story.
>>>
>>> On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyangbo@gmail.com> wrote:
>>>
>>>> +1 This is great work, allowing plugin of different sort shuffle
>>>> write/read implementation! Also great to see it retain the current Spark
>>>> configuration
>>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>>>
>>>>
>>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mcheah@palantir.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>>
>>>>>
>>>>> I would like to call a vote for the SPIP for SPARK-25299
>>>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes
>>>>> to introduce a pluggable storage API for temporary shuffle data.
>>>>>
>>>>>
>>>>>
>>>>> You may find the SPIP document here
>>>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit>
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>> The discussion thread for the SPIP was conducted here
>>>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E>
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>> Please vote on whether or not this proposal is agreeable to you.
>>>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> -Matt Cheah
>>>>>
>>>>

Mime
View raw message