spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Jaggi <>
Subject RDD.combineBy
Date Tue, 27 Jan 2015 21:15:50 GMT
Hi All,
I have a use case where I have an RDD (not a k,v pair) where I want to do a combineByKey()
operation. I can do that by creating an intermediate RDD of k,v pairs and using PairRDDFunctions.combineByKey().
However, I believe it will be more efficient if I can avoid this intermediate RDD. Is there
a way I can do this by passing in a function that extracts the key, like in RDD.groupBy()?
[oops, RDD.groupBy seems to create the intermediate RDD anyway, maybe a better implementation
is possible for that too?]
If not, is it worth adding to the Spark API?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message