flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5874) Reject arrays as keys in DataStream API to avoid inconsistent hashing
Date Fri, 10 Mar 2017 14:29:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905174#comment-15905174

ASF GitHub Bot commented on FLINK-5874:

Github user zentol commented on a diff in the pull request:

    --- Diff: flink-streaming-java/src/test/java/org/apache/flink/streaming/api/DataStreamTest.java
    @@ -906,6 +919,256 @@ public void testChannelSelectors() {
    +	// KeyBy testing
    +	/////////////////////////////////////////////////////////////
    +	@Rule
    +	public ExpectedException expectedException = ExpectedException.none();
    --- End diff --
    Well it may be more concise, but a try catch block makes it way more obvious that this
part of the code throws an exception we care about.
    try-catch blocks are also more commonin the codebase, and I'm wondering whether we want
to introduce an inconsistency there.

> Reject arrays as keys in DataStream API to avoid inconsistent hashing
> ---------------------------------------------------------------------
>                 Key: FLINK-5874
>                 URL: https://issues.apache.org/jira/browse/FLINK-5874
>             Project: Flink
>          Issue Type: Bug
>          Components: DataStream API
>    Affects Versions: 1.2.0, 1.1.4
>            Reporter: Robert Metzger
>            Assignee: Kostas Kloudas
>            Priority: Blocker
> This issue has been reported on the mailing list twice:
> - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Previously-working-job-fails-on-Flink-1-2-0-td11741.html
> - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Arrays-values-in-keyBy-td7530.html
> The problem is the following: We are using just Key[].hashCode() to compute the hash
when shuffling data. Java's default hashCode() implementation doesn't take the arrays contents
into account, but the memory address.
> This leads to different hash code on the sender and receiver side.
> In Flink 1.1 this means that the data is shuffled randomly and not keyed, and in Flink
1.2 the keygroups code detect a violation of the hashing.
> The proper fix of the problem would be to rely on Flink's {{TypeComparator}} class, which
has a type-specific hashing function. But introducing this change would break compatibility
with existing code.
> I'll file a JIRA for the 2.0 changes for that fix.
> For 1.2.1 and 1.3.0 we should at least reject arrays as keys.

This message was sent by Atlassian JIRA

View raw message