spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <>
Subject Re: Disallowing null mergeCombiners
Date Wed, 01 Jan 2014 01:09:36 GMT
I added the option that doesn't require the caller to specify the
mergeCombiner closure a while ago when I wanted to disable mapSideCombine.
In virtually all use cases I know of, it is fine & easy to specify a
mergeCombiner, so I'm all for this given it simplifies the codebase.

On Tue, Dec 31, 2013 at 5:05 PM, Patrick Wendell <> wrote:

> Hey All,
> There is a small API change that we are considering for the external
> sort patch. Previously we allowed mergeCombiner to be null when map
> side aggregation was not enabled. This is because it wasn't necessary
> in that case since mappers didn't ship pre-aggregated values to
> reducers.
> Because the external sort capability also relies on the mergeCombiner
> function to merge partially-aggregated on-disk segments, we now need
> it all the time, even if map side aggregation is enabled. This is a
> fairly esoteric thing that I'm not sure anyone other than Shark ever
> used, but I want to check in case anyone had feelings about this.
> The relevant code is here:
> - Patrick

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message