flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Hogan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3789) Overload methods which trigger program execution to allow naming job
Date Thu, 21 Apr 2016 00:40:25 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251023#comment-15251023

Greg Hogan commented on FLINK-3789:

I was thinking on Clustering Coefficient, for which we return the local clustering coefficient
for each vertex as in DataSet via a GraphAlgorithm, that it would also be nice to compute
the global clustering coefficient which would need to access accumulators. Both local and
global clustering coefficient count triangles so their is certainly advantage it computing
the two simultaneously, but there is extra cost for each so we should allow separate computation.

So there is need to do similar things as collect and count but still allow the user to perform
the execute (which of course allows direct configuration of the job name) so they can compose
multiple algorithms and analytics. Perhaps instead of overloading these functions we can provide
alternative, slightly more sophisticated options which would allow configuring a job name.
In many ways the current implementation of count, collect, print, and checksum is very limiting
because you can only perform that single action per job. You can't print and count, or print
and write. The current DataSet API works well because it's simple, but I think we could expand
on this.

> Overload methods which trigger program execution to allow naming job
> --------------------------------------------------------------------
>                 Key: FLINK-3789
>                 URL: https://issues.apache.org/jira/browse/FLINK-3789
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
> Overload the following functions to additionally accept a job name to pass to {{ExecutionEnvironment.execute(String)}}.
> * {{DataSet.collect()}}
> * {{DataSet.count()}}
> * {{DataSetUtils.checksumHashCode(DataSet)}}
> * {{GraphUtils.checksumHashCode(Graph)}}
> Once the deprecated {{DataSet.print(String)}} and {{DataSet.printToErr(String)}} are
removed we can overload {{DataSet.print()}}.

This message was sent by Atlassian JIRA

View raw message