spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meihua Wu <rotationsymmetr...@gmail.com>
Subject Re: Does RDD.cartesian involve shuffling?
Date Tue, 04 Aug 2015 16:25:55 GMT
Thanks, Richard!

I basically have two RDD's: A and B; and I need to compute a value for
every pair of (a, b) for a in A and b in B. My first thought is
cartesian, but involves expensive shuffle.

Any alternatives? How about I convert B to an array and broadcast it
to every node (assuming B is relative small to fit)?



On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher
<rmarscher@localytics.com> wrote:
> Yes it does, in fact it's probably going to be one of the more expensive
> shuffles you could trigger.
>
> On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu <rotationsymmetry14@gmail.com>
> wrote:
>>
>> Does RDD.cartesian involve shuffling?
>>
>> Thanks!
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
>
>
> --
> Richard Marscher
> Software Engineer
> Localytics
> Localytics.com | Our Blog | Twitter | Facebook | LinkedIn

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message