flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5942) Harden ZooKeeperStateHandleStore to deal with corrupted data
Date Fri, 10 Mar 2017 15:19:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905242#comment-15905242

ASF GitHub Bot commented on FLINK-5942:

Github user tillrohrmann commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java
    @@ -240,7 +241,7 @@ public int exists(String pathInZooKeeper) throws Exception {
     		try {
     			return InstantiationUtil.deserializeObject(data, Thread.currentThread().getContextClassLoader());
     		} catch (IOException | ClassNotFoundException e) {
    -			throw new Exception("Failed to deserialize state handle from ZooKeeper data from "
    +			throw new FlinkIOException("Failed to deserialize state handle from ZooKeeper data
from " +
    --- End diff --
    The idea was to switch slowly to the new `FlinkExceptions`. However, what I just realize
is that `FlinkIOException` does not inherit from `IOException` which is not good if you want
to catch all `IOException`. Will revert it back to `IOException`.

> Harden ZooKeeperStateHandleStore to deal with corrupted data
> ------------------------------------------------------------
>                 Key: FLINK-5942
>                 URL: https://issues.apache.org/jira/browse/FLINK-5942
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.1.4, 1.3.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
> The {{ZooKeeperStateHandleStore}} cannot handle corrupted Znode data. When calling {{ZooKeeperStateHandleStore.getAll}}
or {{getAllSortedByName}} and reading a node with corrupted data, the whole operation will
fail. In such a situation, Flink won't be able to recover because it will read over and over
again the same corrupted Znodes (in the recovery case). Therefore, I propose to ignore Znodes
whose data cannot be read.

This message was sent by Atlassian JIRA

View raw message