Hi,
I couldn't seem to find sufficient documentation or examples of using
combiners in non-trivial ways. Say my map emits values of type Set<String>;
after grouping by key I want to emit the _size_ of the union of the sets of
strings, i.e., size(union(Iterable<Set<String>>)) Thus, the combiner's
type is Iterable<Set<String>> -> Set<String> but the reduce's type is
Iterable<Set<String>> -> Int
To my knowledge, both MapReduce and Spark allow a combiner to have a result
type different from reducer's. However, unless I missed something, this is
not expressible in Crunch. Shouldn't PGroupedTable.combineValues return
PGroupedTable to allow composition with mapValues?
Thanks,
stan
|