spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dhruve ashar <dhruveas...@gmail.com>
Subject Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
Date Tue, 18 Jun 2019 18:57:37 GMT
+1 (non-binding)

On Tue, Jun 18, 2019 at 12:12 PM John Zhuge <john.zhuge@gmail.com> wrote:

> +1 (non-binding)  Great work!
>
> On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh <vganesh@palantir.com> wrote:
>
>> +1 (non-binding).
>>
>>
>>
>> Thanks for pushing this forward, Matt and Yifei.
>>
>>
>>
>> *From: *Felix Cheung <felixcheung_m@hotmail.com>
>> *Date: *Tuesday, June 18, 2019 at 00:01
>> *To: *Yinan Li <liyinan926@gmail.com>, "rblue@netflix.com" <
>> rblue@netflix.com>
>> *Cc: *Dongjoon Hyun <dongjoon.hyun@gmail.com>, Saisai Shao <
>> sai.sai.shao@gmail.com>, Imran Rashid <imran@therashids.com>, Ilan
>> Filonenko <if56@cornell.edu>, bo yang <bobyangbo@gmail.com>, Matt Cheah
<
>> mcheah@palantir.com>, Spark Dev List <dev@spark.apache.org>, "Yifei
>> Huang (PD)" <yifeih@palantir.com>, Vinoo Ganesh <vganesh@palantir.com>,
>> Imran Rashid <irashid@cloudera.com>
>> *Subject: *Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>>
>>
>>
>> +1
>>
>>
>>
>> Glad to see the progress in this space - it’s been more than a year since
>> the original discussion and effort started.
>>
>>
>> ------------------------------
>>
>> *From:* Yinan Li <liyinan926@gmail.com>
>> *Sent:* Monday, June 17, 2019 7:14:42 PM
>> *To:* rblue@netflix.com
>> *Cc:* Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang;
>> Matt Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
>> *Subject:* Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>>
>>
>>
>> +1 (non-binding)
>>
>>
>>
>> On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue <rblue@netflix.com.invalid>
>> wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>> On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
>> wrote:
>>
>> +1
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <sai.sai.shao@gmail.com>
>> wrote:
>>
>> +1 (binding)
>>
>>
>>
>> Thanks
>>
>> Saisai
>>
>>
>>
>> Imran Rashid <imran@therashids.com> 于2019年6月15日周六 上午3:46写道:
>>
>> +1 (binding)
>>
>> I think this is a really important feature for spark.
>>
>> First, there is already a lot of interest in alternative shuffle storage
>> in the community.  There is already a lot of interest in alternative
>> shuffle storage, from dynamic allocation in kubernetes, to even just
>> improving stability in standard on-premise use of Spark.  However, they're
>> often stuck doing this in forks of Spark, and in ways that are not
>> maintainable (because they copy-paste many spark internals) or are
>> incorrect (for not correctly handling speculative execution & stage
>> retries).
>>
>> Second, I think the specific proposal is good for finding the right
>> balance between flexibility and too much complexity, to allow incremental
>> improvements.  A lot of work has been put into this already to try to
>> figure out which pieces are essential to make alternative shuffle storage
>> implementations feasible.
>>
>> Of course, that means it doesn't include everything imaginable; some
>> things still aren't supported, and some will still choose to use the older
>> ShuffleManager api to give total control over all of shuffle.  But we know
>> there are a reasonable set of things which can be implemented behind the
>> api as the first step, and it can continue to evolve.
>>
>>
>>
>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <if56@cornell.edu> wrote:
>>
>> +1 (non-binding). This API is versatile and flexible enough to handle
>> Bloomberg's internal use-cases. The ability for us to vary implementation
>> strategies is quite appealing. It is also worth to note the minimal changes
>> to Spark core in order to make it work. This is a very much needed addition
>> within the Spark shuffle story.
>>
>>
>>
>> On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyangbo@gmail.com> wrote:
>>
>> +1 This is great work, allowing plugin of different sort shuffle
>> write/read implementation! Also great to see it retain the current Spark
>> configuration
>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>
>>
>>
>>
>>
>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mcheah@palantir.com> wrote:
>>
>> Hi everyone,
>>
>>
>>
>> I would like to call a vote for the SPIP for SPARK-25299
>> [issues.apache.org]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D25299&d=DwMFJg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU&s=LS6AKX38P5DW6ffk9u5MUvRBEAlAHiA3Ud2KODpWkQU&e=>,
>> which proposes to introduce a pluggable storage API for temporary shuffle
>> data.
>>
>>
>>
>> You may find the SPIP document here [docs.google.com]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n-5F0iMSWwhCQ_edit&d=DwMFJg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU&s=rCSgQGD6L4of4oa0QxiTJ8IPaVdGlZVarhA4-QvO80Q&e=>
>> .
>>
>>
>>
>> The discussion thread for the SPIP was conducted here [lists.apache.org]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079-40-253Cdev.spark.apache.org-253E&d=DwMFJg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU&s=kSJizQH7v4OHG6D7aVsLA-m0ApZxOa24CzHZv1EzLxg&e=>
>> .
>>
>>
>>
>> Please vote on whether or not this proposal is agreeable to you.
>>
>>
>>
>> Thanks!
>>
>>
>>
>> -Matt Cheah
>>
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>>
>
> --
> John
>


-- 
-Dhruve Ashar

Mime
View raw message