flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2716) Checksum method for DataSet and Graph
Date Fri, 09 Oct 2015 14:45:30 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950500#comment-14950500

Stephan Ewen commented on FLINK-2716:

Comparators always refer to keys. Some operations thus only need serializers.

You can have two reducers consuming the result of a mapper for example. In that case, the
type and serializer between the mapper and both reducers is the same, but the comparators
may be different if the group on different keys.

> Checksum method for DataSet and Graph
> -------------------------------------
>                 Key: FLINK-2716
>                 URL: https://issues.apache.org/jira/browse/FLINK-2716
>             Project: Flink
>          Issue Type: Improvement
>          Components: Gelly, Java API, Scala API
>    Affects Versions: 0.10
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
> {{DataSet.count()}}, {{Graph.numberOfVertices()}}, and {{Graph.numberOfEdges()}} provide
measures of the number of distributed data elements. New {{DataSet.checksum()}} and {{Graph.checksum()}}
methods will summarize the content of data elements and support algorithm validation, integration
testing, and benchmarking.

This message was sent by Atlassian JIRA

View raw message