flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8807) ZookeeperCompleted checkpoint store can get stuck in infinite loop
Date Fri, 02 Mar 2018 16:52:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383803#comment-16383803
] 

ASF GitHub Bot commented on FLINK-8807:
---------------------------------------

GitHub user aljoscha opened a pull request:

    https://github.com/apache/flink/pull/5623

    [FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop

    Before, CompletedCheckpoint did not have proper equals()/hashCode(),
    which meant that the fixpoint condition in
    ZooKeeperCompletedCheckpointStore would never hold if at least on
    checkpoint became unreadable.
    
    This adds proper equals()/hashCode() to CompletedCheckpoint and extends
    the test to properly create new CompletedCheckpoints. Before, we were
    reusing the same CompletedCheckpoint instances, meaning that
    Objects.equals()/hashCode() would make the test succeed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink jira-8807-zookeeper-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5623.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5623
    
----
commit 777ddb57ee72d200d1312dc8e6dfdb52af6b9950
Author: Aljoscha Krettek <aljoscha.krettek@...>
Date:   2018-03-02T16:46:56Z

    [FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop
    
    Before, CompletedCheckpoint did not have proper equals()/hashCode(),
    which meant that the fixpoint condition in
    ZooKeeperCompletedCheckpointStore would never hold if at least on
    checkpoint became unreadable.
    
    This adds proper equals()/hashCode() to CompletedCheckpoint and extends
    the test to properly create new CompletedCheckpoints. Before, we were
    reusing the same CompletedCheckpoint instances, meaning that
    Objects.equals()/hashCode() would make the test succeed.

----


> ZookeeperCompleted checkpoint store can get stuck in infinite loop
> ------------------------------------------------------------------
>
>                 Key: FLINK-8807
>                 URL: https://issues.apache.org/jira/browse/FLINK-8807
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> This code: https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201
can be stuck forever if at least one checkpoint is not readable because {{CompletedCheckpoint}}
does not have a proper {{equals()}}/{{hashCode()}} anymore.
> We have to fix this and also add a unit test that verifies the loop still works if we
make one snapshot unreadable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message