spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <im...@therashids.com>
Subject Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
Date Fri, 14 Jun 2019 19:45:50 GMT
 +1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in
the community.  There is already a lot of interest in alternative shuffle
storage, from dynamic allocation in kubernetes, to even just improving
stability in standard on-premise use of Spark.  However, they're often
stuck doing this in forks of Spark, and in ways that are not maintainable
(because they copy-paste many spark internals) or are incorrect (for not
correctly handling speculative execution & stage retries).

Second, I think the specific proposal is good for finding the right balance
between flexibility and too much complexity, to allow incremental
improvements.  A lot of work has been put into this already to try to
figure out which pieces are essential to make alternative shuffle storage
implementations feasible.

Of course, that means it doesn't include everything imaginable; some things
still aren't supported, and some will still choose to use the older
ShuffleManager api to give total control over all of shuffle.  But we know
there are a reasonable set of things which can be implemented behind the
api as the first step, and it can continue to evolve.

On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <if56@cornell.edu> wrote:

> +1 (non-binding). This API is versatile and flexible enough to handle
> Bloomberg's internal use-cases. The ability for us to vary implementation
> strategies is quite appealing. It is also worth to note the minimal changes
> to Spark core in order to make it work. This is a very much needed addition
> within the Spark shuffle story.
>
> On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyangbo@gmail.com> wrote:
>
>> +1 This is great work, allowing plugin of different sort shuffle
>> write/read implementation! Also great to see it retain the current Spark
>> configuration
>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>
>>
>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mcheah@palantir.com> wrote:
>>
>>> Hi everyone,
>>>
>>>
>>>
>>> I would like to call a vote for the SPIP for SPARK-25299
>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes to
>>> introduce a pluggable storage API for temporary shuffle data.
>>>
>>>
>>>
>>> You may find the SPIP document here
>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit>
>>> .
>>>
>>>
>>>
>>> The discussion thread for the SPIP was conducted here
>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E>
>>> .
>>>
>>>
>>>
>>> Please vote on whether or not this proposal is agreeable to you.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -Matt Cheah
>>>
>>

Mime
View raw message