I couldn't seem to find sufficient documentation or examples of using combiners in non-trivial ways. Say my map emits values of type Set<String>; after grouping by key I want to emit the _size_ of the union of the sets of strings, i.e., size(union(Iterable<Set<String>>)) Thus, the combiner's type is Iterable<Set<String>> -> Set<String> but the reduce's type is Iterable<Set<String>> -> Int
To my knowledge, both MapReduce and Spark allow a combiner to have a result type different from reducer's. However, unless I missed something, this is not expressible in Crunch. Shouldn't PGroupedTable.combineValues return PGroupedTable to allow composition with mapValues?