flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Kruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1043) Alternative combine interface
Date Fri, 08 Aug 2014 14:34:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090803#comment-14090803

Sebastian Kruse commented on FLINK-1043:

I see. Another approach might be to have a reduce-merge interface instead of the classical
combine-reduce. Oftentimes, the combine and reduce functions are logically equivalent (for
associative, commutative functions), only the combine function works only on the subset of
the complete dataset. Therefore, one could the reduce step as core logic but a subsequent
merge step as the optional part.

> Alternative combine interface
> -----------------------------
>                 Key: FLINK-1043
>                 URL: https://issues.apache.org/jira/browse/FLINK-1043
>             Project: Flink
>          Issue Type: Wish
>            Reporter: Sebastian Kruse
> The GroupReduce allows for the following combination reduce step: {{InputType}} ->
combine -> {{InputType}} -> reduce -> {{OutputType}}. However, in the use cases I
have stumbled upon so far, it would make more sense to have the following steps: {{InputType}}
-> {{OutputType}} -> {{OutputType}}. It seems more intuitive to me to create a set of
partial results with the combiners that will finally merged within the reduce phase into an
overall result. This sometimes bars me from using a combiner.
> I provide some examples for this intuition.
> * WordCount
> ** If you want to implement WordCount with as a combinable GroupReduce, then you have
to preprocess all words as {{Tuple2<String, 1>}}. This could be avoided if the combination
result was not necessarily equal to the input type.
> * create a Bloom filter
> ** Bloom filters can be created locally on each node and later on be merged into a final,
global Bloom filter, thus lend themselves for a combine-reduce proceeding. Doing this with
a combinable GroupReduce would currently require to turn each input element into a singleton
Bloom filter before the combination phase.
> Therefore, it would be nice to have the ability to use {{OutputType}} as the combiner

This message was sent by Atlassian JIRA

View raw message