spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Zhuge <john.zh...@gmail.com>
Subject Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
Date Tue, 18 Jun 2019 17:12:10 GMT
+1 (non-binding)  Great work!

On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh <vganesh@palantir.com> wrote:

> +1 (non-binding).
>
>
>
> Thanks for pushing this forward, Matt and Yifei.
>
>
>
> *From: *Felix Cheung <felixcheung_m@hotmail.com>
> *Date: *Tuesday, June 18, 2019 at 00:01
> *To: *Yinan Li <liyinan926@gmail.com>, "rblue@netflix.com" <
> rblue@netflix.com>
> *Cc: *Dongjoon Hyun <dongjoon.hyun@gmail.com>, Saisai Shao <
> sai.sai.shao@gmail.com>, Imran Rashid <imran@therashids.com>, Ilan
> Filonenko <if56@cornell.edu>, bo yang <bobyangbo@gmail.com>, Matt Cheah <
> mcheah@palantir.com>, Spark Dev List <dev@spark.apache.org>, "Yifei Huang
> (PD)" <yifeih@palantir.com>, Vinoo Ganesh <vganesh@palantir.com>, Imran
> Rashid <irashid@cloudera.com>
> *Subject: *Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>
>
>
> +1
>
>
>
> Glad to see the progress in this space - it’s been more than a year since
> the original discussion and effort started.
>
>
> ------------------------------
>
> *From:* Yinan Li <liyinan926@gmail.com>
> *Sent:* Monday, June 17, 2019 7:14:42 PM
> *To:* rblue@netflix.com
> *Cc:* Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang;
> Matt Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
> *Subject:* Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>
>
>
> +1 (non-binding)
>
>
>
> On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue <rblue@netflix.com.invalid>
> wrote:
>
> +1 (non-binding)
>
>
>
> On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
> wrote:
>
> +1
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <sai.sai.shao@gmail.com>
> wrote:
>
> +1 (binding)
>
>
>
> Thanks
>
> Saisai
>
>
>
> Imran Rashid <imran@therashids.com> 于2019年6月15日周六 上午3:46写道:
>
> +1 (binding)
>
> I think this is a really important feature for spark.
>
> First, there is already a lot of interest in alternative shuffle storage
> in the community.  There is already a lot of interest in alternative
> shuffle storage, from dynamic allocation in kubernetes, to even just
> improving stability in standard on-premise use of Spark.  However, they're
> often stuck doing this in forks of Spark, and in ways that are not
> maintainable (because they copy-paste many spark internals) or are
> incorrect (for not correctly handling speculative execution & stage
> retries).
>
> Second, I think the specific proposal is good for finding the right
> balance between flexibility and too much complexity, to allow incremental
> improvements.  A lot of work has been put into this already to try to
> figure out which pieces are essential to make alternative shuffle storage
> implementations feasible.
>
> Of course, that means it doesn't include everything imaginable; some
> things still aren't supported, and some will still choose to use the older
> ShuffleManager api to give total control over all of shuffle.  But we know
> there are a reasonable set of things which can be implemented behind the
> api as the first step, and it can continue to evolve.
>
>
>
> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <if56@cornell.edu> wrote:
>
> +1 (non-binding). This API is versatile and flexible enough to handle
> Bloomberg's internal use-cases. The ability for us to vary implementation
> strategies is quite appealing. It is also worth to note the minimal changes
> to Spark core in order to make it work. This is a very much needed addition
> within the Spark shuffle story.
>
>
>
> On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyangbo@gmail.com> wrote:
>
> +1 This is great work, allowing plugin of different sort shuffle
> write/read implementation! Also great to see it retain the current Spark
> configuration
> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>
>
>
>
>
> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mcheah@palantir.com> wrote:
>
> Hi everyone,
>
>
>
> I would like to call a vote for the SPIP for SPARK-25299
> [issues.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D25299&d=DwMFJg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU&s=LS6AKX38P5DW6ffk9u5MUvRBEAlAHiA3Ud2KODpWkQU&e=>,
> which proposes to introduce a pluggable storage API for temporary shuffle
> data.
>
>
>
> You may find the SPIP document here [docs.google.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n-5F0iMSWwhCQ_edit&d=DwMFJg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU&s=rCSgQGD6L4of4oa0QxiTJ8IPaVdGlZVarhA4-QvO80Q&e=>
> .
>
>
>
> The discussion thread for the SPIP was conducted here [lists.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079-40-253Cdev.spark.apache.org-253E&d=DwMFJg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU&s=kSJizQH7v4OHG6D7aVsLA-m0ApZxOa24CzHZv1EzLxg&e=>
> .
>
>
>
> Please vote on whether or not this proposal is agreeable to you.
>
>
>
> Thanks!
>
>
>
> -Matt Cheah
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>

-- 
John

Mime
View raw message