crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stan Rosenberg <>
Subject PGroupedTable.combineValues should allow composition with PGroupedTable.mapValues
Date Tue, 17 May 2016 06:01:18 GMT

I couldn't seem to find sufficient documentation or examples of using
combiners in non-trivial ways. Say my map emits values of type Set<String>;
after grouping by key I want to emit the _size_ of the union of the sets of
strings, i.e., size(union(Iterable<Set<String>>))  Thus, the combiner's
type is Iterable<Set<String>> -> Set<String> but the reduce's type is
Iterable<Set<String>> -> Int

To my knowledge, both MapReduce and Spark allow a combiner to have a result
type different from reducer's.  However, unless I missed something, this is
not expressible in Crunch.  Shouldn't PGroupedTable.combineValues return
PGroupedTable to allow composition with mapValues?



View raw message