You can still do parallelDo on a PGroupedTable to allow it to map to a
different type. Just would be new DoFn<Pair<Key, Set<String>>, Pair<Key,
Integer>>
On Tue, May 17, 2016 at 2:01 AM Stan Rosenberg <stan.rosenberg@gmail.com>
wrote:
> Hi,
>
> I couldn't seem to find sufficient documentation or examples of using
> combiners in non-trivial ways. Say my map emits values of type Set<String>;
> after grouping by key I want to emit the _size_ of the union of the sets of
> strings, i.e., size(union(Iterable<Set<String>>)) Thus, the combiner's
> type is Iterable<Set<String>> -> Set<String> but the reduce's type
is
> Iterable<Set<String>> -> Int
>
> To my knowledge, both MapReduce and Spark allow a combiner to have a
> result type different from reducer's. However, unless I missed something,
> this is not expressible in Crunch. Shouldn't PGroupedTable.combineValues
> return PGroupedTable to allow composition with mapValues?
>
> Thanks,
>
> stan
>
|