spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Graves <tgraves...@yahoo.com.INVALID>
Subject Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
Date Fri, 21 Jun 2019 17:02:10 GMT
 +1 (binding)
I haven't looked at the low level api, but like the idea and approach to get it started.
Tom
    On Tuesday, June 18, 2019, 10:40:34 PM CDT, Guo, Chenzhao <chenzhao.guo@intel.com>
wrote:  
 
 #yiv1391836063 #yiv1391836063 -- _filtered #yiv1391836063 {font-family:SimSun;panose-1:2
1 6 0 3 1 1 1 1 1;} _filtered #yiv1391836063 {panose-1:2 11 6 9 7 2 5 8 2 4;} _filtered #yiv1391836063
{panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv1391836063 {font-family:Calibri;panose-1:2 15
5 2 2 2 4 3 2 4;} _filtered #yiv1391836063 {panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv1391836063
{panose-1:2 11 6 9 7 2 5 8 2 4;}#yiv1391836063 #yiv1391836063 p.yiv1391836063MsoNormal, #yiv1391836063
li.yiv1391836063MsoNormal, #yiv1391836063 div.yiv1391836063MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:New
serif;}#yiv1391836063 a:link, #yiv1391836063 span.yiv1391836063MsoHyperlink {color:blue;text-decoration:underline;}#yiv1391836063
a:visited, #yiv1391836063 span.yiv1391836063MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv1391836063
span.yiv1391836063EmailStyle17 {font-family:sans-serif;color:#1F497D;}#yiv1391836063 .yiv1391836063MsoChpDefault
{font-family:sans-serif;} _filtered #yiv1391836063 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv1391836063
div.yiv1391836063WordSection1 {}#yiv1391836063 
Cool : )
 
  
 
+1 (non-binding)
 
  
 
Chenzhao
 
  
 
From: dhruve ashar [mailto:dhruveashar@gmail.com]
Sent: Wednesday, June 19, 2019 2:58 AM
To: John Zhuge <john.zhuge@gmail.com>
Cc: Vinoo Ganesh <vganesh@palantir.com>; Felix Cheung <felixcheung_m@hotmail.com>;
Yinan Li <liyinan926@gmail.com>; rblue@netflix.com; Dongjoon Hyun <dongjoon.hyun@gmail.com>;
Saisai Shao <sai.sai.shao@gmail.com>; Imran Rashid <imran@therashids.com>; Ilan
Filonenko <if56@cornell.edu>; bo yang <bobyangbo@gmail.com>; Matt Cheah <mcheah@palantir.com>;
Spark Dev List <dev@spark.apache.org>; Yifei Huang (PD) <yifeih@palantir.com>;
Imran Rashid <irashid@cloudera.com>
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
 
  
 
+1 (non-binding)
 
  
 
On Tue, Jun 18, 2019 at 12:12 PM John Zhuge <john.zhuge@gmail.com> wrote:
 

+1 (non-binding)  Great work!
 
  
 
On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh <vganesh@palantir.com> wrote:
 

+1 (non-binding).
 
 
 
Thanks for pushing this forward, Matt and Yifei.
 
 
 
From:Felix Cheung <felixcheung_m@hotmail.com>
Date: Tuesday, June 18, 2019 at 00:01
To: Yinan Li <liyinan926@gmail.com>, "rblue@netflix.com" <rblue@netflix.com>
Cc: Dongjoon Hyun <dongjoon.hyun@gmail.com>, Saisai Shao <sai.sai.shao@gmail.com>,
Imran Rashid <imran@therashids.com>, Ilan Filonenko <if56@cornell.edu>, bo yang
<bobyangbo@gmail.com>, Matt Cheah <mcheah@palantir.com>, Spark Dev List <dev@spark.apache.org>,
"Yifei Huang (PD)" <yifeih@palantir.com>, Vinoo Ganesh <vganesh@palantir.com>,
Imran Rashid <irashid@cloudera.com>
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
 
 
 
+1
 
 
 
Glad to see the progress in this space - it’s been more than a year since the original discussion
and effort started.
 
 
 
From: Yinan Li <liyinan926@gmail.com>
Sent: Monday, June 17, 2019 7:14:42 PM
To: rblue@netflix.com
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt Cheah; Spark Dev
List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API 
 
 
 
+1 (non-binding) 
 
 
 
On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue <rblue@netflix.com.invalid> wrote:
 

+1 (non-binding)
 
 
 
On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun <dongjoon.hyun@gmail.com> wrote:
 

+1
 
 
 
Bests,
 
Dongjoon.
 
 
 
 
 
On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <sai.sai.shao@gmail.com> wrote:
 

+1 (binding)
 
 
 
Thanks
 
Saisai
 
 
 
Imran Rashid <imran@therashids.com>于2019年6月15日周六上午3:46写道:
 

+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in the community. 
There is already a lot of interest in alternative shuffle storage, from dynamic allocation
in kubernetes, to even just improving stability in standard on-premise use of Spark.  However,
they're often stuck doing this in forks of Spark, and in ways that are not maintainable (because
they copy-paste many spark internals) or are incorrect (for not correctly handling speculative
execution & stage retries).

Second, I think the specific proposal is good for finding the right balance between flexibility
and too much complexity, to allow incremental improvements.  A lot of work has been put into
this already to try to figure out which pieces are essential to make alternative shuffle storage
implementations feasible.

Of course, that means it doesn't include everything imaginable; some things still aren't supported,
and some will still choose to use the older ShuffleManager api to give total control over
all of shuffle.  But we know there are a reasonable set of things which can be implemented
behind the api as the first step, and it can continue to evolve.
 
 
 
On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <if56@cornell.edu> wrote:
 

+1 (non-binding). This API is versatile and flexible enough to handle Bloomberg's internal
use-cases. The ability for us to vary implementation strategies is quite appealing. It is
also worth to note the minimal changes to Spark core in order to make it work. This is a very
much needed addition within the Spark shuffle story. 
 
 
 
On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyangbo@gmail.com> wrote:
 

+1 This is great work, allowing plugin of different sort shuffle write/read implementation!
Also great to see it retain the current Spark configuration (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
 
 
 
 
 
On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mcheah@palantir.com> wrote:
 

Hi everyone,
 
 
 
I would like to call a vote for the SPIP forSPARK-25299 [issues.apache.org], which proposes
to introduce a pluggable storage API for temporary shuffle data.
 
 
 
You may find the SPIP documenthere [docs.google.com].
 
 
 
The discussion thread for the SPIP was conductedhere [lists.apache.org].
 
 
 
Please vote on whether or not this proposal is agreeable to you.
 
 
 
Thanks!
 
 
 
-Matt Cheah
 








 
 
 
--
 
Ryan Blue
 
Software Engineer
 
Netflix
 




 
  
 
-- 
 
John
 



-- 
 
-Dhruve Ashar
 
  
   
Mime
View raw message