spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: How to deal with multidimensional keys?
Date Thu, 02 Jan 2014 23:28:22 GMT
If you had RDD[[i, j, k], value] then you could reduce by j by essentially
mapping j into the key slot, doing the reduce, and then mapping it back:

rdd.map( ((i,j,k),v) => (j, (i, k, v)).reduce( ... ).map( (j,(i,k,v)) =>
((i,j,k),v))

It's not pretty, but I've had to use this pattern before too.


On Thu, Jan 2, 2014 at 6:23 PM, Aureliano Buendia <buendia360@gmail.com>wrote:

> Hi,
>
> How is it possible to reduce by multidimensional keys?
>
> For example, if every line is a tuple like:
>
> (i, j, k, value)
>
> or, alternatively:
>
> ((I, j, k), value)
>
> how can spark handle reducing over j, or k?
>

Mime
View raw message