spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "K. Shankari" <shank...@eecs.berkeley.edu>
Subject Re: How to deal with multidimensional keys?
Date Fri, 03 Jan 2014 02:03:01 GMT
I have had to use this as well.

Sometimes, I create a POJO to hold the multi-dimensional key to make things
easier.

ie.
class MultiKey(i, j, k) {
}

then I can define a reduce function that is over the multikey, e.g.

def reduceByI(mkv1: (MultiKey, Value), mkv2: (MultiKey: Value)) = if
(mkv1.i > mkv2.i) v1 else v2

and then I can do

rdd.reduce(reduceByI)

Thanks,
Shankari


On Thu, Jan 2, 2014 at 3:28 PM, Andrew Ash <andrew@andrewash.com> wrote:

> If you had RDD[[i, j, k], value] then you could reduce by j by essentially
> mapping j into the key slot, doing the reduce, and then mapping it back:
>
> rdd.map( ((i,j,k),v) => (j, (i, k, v)).reduce( ... ).map( (j,(i,k,v)) =>
> ((i,j,k),v))
>
> It's not pretty, but I've had to use this pattern before too.
>
>
> On Thu, Jan 2, 2014 at 6:23 PM, Aureliano Buendia <buendia360@gmail.com>wrote:
>
>> Hi,
>>
>> How is it possible to reduce by multidimensional keys?
>>
>> For example, if every line is a tuple like:
>>
>> (i, j, k, value)
>>
>> or, alternatively:
>>
>> ((I, j, k), value)
>>
>> how can spark handle reducing over j, or k?
>>
>
>

Mime
View raw message