flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Hogan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2716) Checksum method for DataSet and Graph
Date Fri, 02 Oct 2015 16:52:26 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941379#comment-14941379
] 

Greg Hogan commented on FLINK-2716:
-----------------------------------

[~StephanEwen], I would like to use {{TypeComparator.hash}} within a {{RichFlatMapFunction}}
(similar to {{DataSet.count}}) for this implementation. You had noted earlier discussion about
making serializers available to {{RichFunction}} implementations and access to type comparators
could be implemented likewise.

Ideally the user would only see the available number of serializers and type comparators:
{{getInputSerializer()}} for single input functions, {{getFirstInputSerializer()}} and {{getSecondInputSerializer()}}
for dual input functions.

Currently {{RichFlatMapFunction}} inherits from {{AbstractRichFunction}} which implements
access to the {{RuntimeContext}}. We could add a layer and have each single input function
inherit from an {{AbstractSingleInputRichFunction}} (similar to how {{FlatMapOperator}} inherits
from {{SingleInputUdfOperator}}) that would provide access to serializers and type comparators
(and likewise with {{AbstractTwoInputRichFunction}} for dual input functions).

> Checksum method for DataSet and Graph
> -------------------------------------
>
>                 Key: FLINK-2716
>                 URL: https://issues.apache.org/jira/browse/FLINK-2716
>             Project: Flink
>          Issue Type: Improvement
>          Components: Gelly, Java API, Scala API
>    Affects Versions: master
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
>
> {{DataSet.count()}}, {{Graph.numberOfVertices()}}, and {{Graph.numberOfEdges()}} provide
measures of the number of distributed data elements. New {{DataSet.checksum()}} and {{Graph.checksum()}}
methods will summarize the content of data elements and support algorithm validation, integration
testing, and benchmarking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message