flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-758) Add count method to DataSet and implement CountOperator
Date Sun, 07 Sep 2014 16:30:28 GMT

    [ https://issues.apache.org/jira/browse/FLINK-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124957#comment-14124957
] 

ASF GitHub Bot commented on FLINK-758:
--------------------------------------

Github user uce commented on the pull request:

    https://github.com/apache/incubator-flink/pull/63#issuecomment-54751876
  
    Thanks for the review. The initial value for the reduce function and the count operator
are tightly connected. The reduce with initial value is the general solution, of which the
count operator is a special case. Therefore, I wouldn't say that these are independent features.
The refactorings are also limited to files related to the initial value reduce/count operator.
    
    The counting for grouped data sets was a quick fix after @hsaputra's comment. We can either
fix it with this PR or open a seperate issue if we want to merge it.
    
    I think the limitation to AllReduce was the result of a discussion with you and @StephanEwen.
    
    ---
    
    All in all, I think that we should wait for the upcoming changes to the runtime and scheduler
to support the more intuitive API of simply returning the count to the user program. As you
said, we might move some of the changes (like initial value reduce) to a separate issue if
we find them useful.


> Add count method to DataSet and implement CountOperator
> -------------------------------------------------------
>
>                 Key: FLINK-758
>                 URL: https://issues.apache.org/jira/browse/FLINK-758
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>         Attachments: pull-request-758-7518001488867571817.patch
>
>
> At the request of @twalthr. This is the count operator I've implemented some time ago
to get the to know the new Java API. It introduces `DataSet.count()`, which is executed as
a map (to ones) and reduce (sum up the ones). I initially didn't do the PR, because of the
following problem: empty DataSets don't work as the first map won't have any input to operate
on.
> If more people think that we should include this operator we can think about a possible
solution to the problem.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/pull/758
> Created by: [uce|https://github.com/uce]
> Labels: enhancement, java api, 
> Milestone: Release 0.6 (unplanned)
> Created at: Tue May 06 10:42:33 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message