spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: ReduceByKey but with different functions depending on key
Date Tue, 18 Nov 2014 15:42:02 GMT
groupByKey does not run a combiner so be careful about the
performance...groupByKey does shuffle even for local groups...

reduceByKey and aggregateByKey does run a combiner but if you want a
separate function for each key, you can have a key to closure map that you
can broadcast and use it in reduceByKey if you have access to the key in
reduceByKey/aggregateByKey...

I did not have the need to access the key in reduceByKey/aggregateByKey yet
but there should be a way...

On Tue, Nov 18, 2014 at 7:24 AM, Yanbo <yanbohappy@gmail.com> wrote:

> First use groupByKey(), you get a tuple RDD with
> (key:K,value:ArrayBuffer[V]).
> Then use map() on this RDD with a function has different operations
> depending on the key which act as a parameter of this function.
>
>
> > 在 2014年11月18日,下午8:59,jelgh <johannes.elgh@gmail.com> 写道:
> >
> > Hello everyone,
> >
> > I'm new to Spark and I have the following problem:
> >
> > I have this large JavaRDD<MyClass> collection, which I group with by
> > creating a hashcode from some fields in MyClass:
> >
> > JavaRDD<MyClass> collection = ...;
> > JavaPairRDD<Integer, Iterable&lt;MyClass>> grouped =
> > collection.groupBy(...); // the group-function is just creating a
> hashcode
> > from some fields in MyClass.
> >
> > Now I want to reduce the variable grouped. However, I want to reduce it
> with
> > different functions depending on the key in the JavaPairRDD. So
> basically a
> > reduceByKey but with multiple functions.
> >
> > Only solution I've come up with is by filtering grouped for each reduce
> > function and apply it on the filtered  subsets. This feels kinda hackish
> > though.
> >
> > Is there a better way?
> >
> > Best regards,
> > Johannes
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ReduceByKey-but-with-different-functions-depending-on-key-tp19177.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message