flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1043) Alternative combine interface
Date Fri, 08 Aug 2014 15:09:11 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090842#comment-14090842

Fabian Hueske commented on FLINK-1043:

Actually, I like the idea of a second (non-optional) combiner interface. It would indeed make
some things a lot nicer as your examples show.
I think it shouldn't be a lot of effort to get that working.

> Alternative combine interface
> -----------------------------
>                 Key: FLINK-1043
>                 URL: https://issues.apache.org/jira/browse/FLINK-1043
>             Project: Flink
>          Issue Type: Wish
>            Reporter: Sebastian Kruse
> The GroupReduce allows for the following combination reduce step: {{InputType}} ->
combine -> {{InputType}} -> reduce -> {{OutputType}}. However, in the use cases I
have stumbled upon so far, it would make more sense to have the following steps: {{InputType}}
-> {{OutputType}} -> {{OutputType}}. It seems more intuitive to me to create a set of
partial results with the combiners that will finally merged within the reduce phase into an
overall result. This sometimes bars me from using a combiner.
> I provide some examples for this intuition.
> * WordCount
> ** If you want to implement WordCount with as a combinable GroupReduce, then you have
to preprocess all words as {{Tuple2<String, 1>}}. This could be avoided if the combination
result was not necessarily equal to the input type.
> * create a Bloom filter
> ** Bloom filters can be created locally on each node and later on be merged into a final,
global Bloom filter, thus lend themselves for a combine-reduce proceeding. Doing this with
a combinable GroupReduce would currently require to turn each input element into a singleton
Bloom filter before the combination phase.
> Therefore, it would be nice to have the ability to use {{OutputType}} as the combiner

This message was sent by Atlassian JIRA

View raw message