flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5036) Perform the grouping of keys in restoring instead of checkpointing
Date Wed, 09 Nov 2016 08:45:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650354#comment-15650354

Stephan Ewen commented on FLINK-5036:

Actually, the checkpointing operation is very cheap, as the data is pre-organized into key
groups already.
There was quite a long design process in getting this fleshed out and it seems to work well.
I think we should not change this.

As a general design thought:
Fast recovery is very important. A system that checkpoints slightly faster but where recovery
takes much longer is not desirable. It misses more SLAs than a system that has slightly higher
checkpoint overhead but faster recovery.

> Perform the grouping of keys in restoring instead of checkpointing
> ------------------------------------------------------------------
>                 Key: FLINK-5036
>                 URL: https://issues.apache.org/jira/browse/FLINK-5036
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
> Whenever taking snapshots of {{RocksDBKeyedStateBackend}}, the values in the states will
be written onto different files according to their key groups. The procedure is very costly
when the states are very big. 
> Given that the snapshot operations will be performed much more frequently than restoring,
we can leave the key groups as they are to improve the overall performance. In other words,
we can perform the grouping of keys in restoring instead of in checkpointing.
> I think, the implementation will be very similar to the restoring of non-partitioned
states. Each task will receive a collection of snapshots each of which contains a set of key
groups. Each task will restore its states from the given snapshots by picking values in assigned
key groups.

This message was sent by Atlassian JIRA

View raw message