spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo <yanboha...@gmail.com>
Subject Re: ReduceByKey but with different functions depending on key
Date Tue, 18 Nov 2014 15:24:11 GMT
First use groupByKey(), you get a tuple RDD with (key:K,value:ArrayBuffer[V]).
Then use map() on this RDD with a function has different operations depending on the key which
act as a parameter of this function.


> 在 2014年11月18日,下午8:59,jelgh <johannes.elgh@gmail.com> 写道:
> 
> Hello everyone,
> 
> I'm new to Spark and I have the following problem:
> 
> I have this large JavaRDD<MyClass> collection, which I group with by
> creating a hashcode from some fields in MyClass:
> 
> JavaRDD<MyClass> collection = ...;
> JavaPairRDD<Integer, Iterable&lt;MyClass>> grouped =
> collection.groupBy(...); // the group-function is just creating a hashcode
> from some fields in MyClass.
> 
> Now I want to reduce the variable grouped. However, I want to reduce it with
> different functions depending on the key in the JavaPairRDD. So basically a
> reduceByKey but with multiple functions.
> 
> Only solution I've come up with is by filtering grouped for each reduce
> function and apply it on the filtered  subsets. This feels kinda hackish
> though. 
> 
> Is there a better way? 
> 
> Best regards,
> Johannes
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ReduceByKey-but-with-different-functions-depending-on-key-tp19177.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message