spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Combiners in Spark
Date Mon, 02 Mar 2015 11:29:58 GMT
I think the simplest answer is that it's not really a separate concept
from the 'reduce' function, because Spark's API is a sort of simpler,
purer form of FP. It is just the same function that can be applied at
many points in an aggregation -- both "map side" (a la Combiners in
MapReduce) or "reduce side" (a la Reducers in MapReduce).

In MapReduce even, you could often use the same function for combine
and reduce. They were separate mostly because the semantics for what
happened in Reducer were different; this would map to more than just
reduceByKey in Spark.

These various "ByKey" operations build on combineByKey, yes. Despite
its name, mergeCombiners is not only a Combiner-style function. It's
the reduce function, applied in several places. You can control
whether it is applied map-side or not, but it is by default.

So combiners are pretty automatic in Spark.

On Mon, Mar 2, 2015 at 10:55 AM, Guillermo Ortiz <> wrote:
> Which is the equivalent function to "Combiners" of MapReduce in Spark?
> I guess that it's combineByKey, but is combineByKey executed locally?
> I understand than functions as reduceByKey or foldByKey aren't executed locally.
> Reading the documentation looks like combineByKey is equivalent to
> reduceByKey just that combineByKey you can specify an different output
> than the input you have.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message