crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Beech <>
Subject Re: PGroupedTable.combineValues question
Date Fri, 05 Apr 2013 16:00:18 GMT
Hi Gabriel. Thanks for that. It seemed a bit wrong to *not* be using
combineValues, but a DoFn works just fine.


On 5 April 2013 16:48, Gabriel Reid <> wrote:

> Hi Dave,
> On Fri, Apr 5, 2013 at 5:05 PM, Dave Beech <> wrote:
>> I have a PGroupedTable<A,B> and I want to aggregate / combine the values
>> to produce a PCollection<C> - in other words, I need the type of the
>> aggregate to be different to the original value type.
>> What's the best approach? The combineValues method takes either an
>> Aggregator or a CombineFn but as far as I can see, both of these assume the
>> end result will be of the same type as the values.
> The approach that I always use for this is just creating a custom DoFn to
> operate on the PGroupedTable and construct the instance of type C based in
> the incoming Iterable fromt he PGroupedTable. This basically works out to
> the same as a Aggregator.
> I don't think that this scenario would be technically applicable to a
> CombineFn, because the CombineFn can be called any number of times on an
> incoming set of values, on both the map and reduce sides of a job. In order
> to map values to another type, the intermediate value of type C would
> somehow need to be given to the CombineFn each time it was used.
> - Gabriel

View raw message