flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Richter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5036) Perform the grouping of keys in restoring instead of checkpointing
Date Wed, 09 Nov 2016 09:39:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650460#comment-15650460

Stefan Richter commented on FLINK-5036:

>From the description it is not entirely clear if you are discussing a theoretical or a
concrete problem that you observed. If there is an observable performance problem, could you
please provide some backing measurements (log outputs), the Flink version, and information
about your keys and values to help us figure out the cause and extend of the problem?

If we are talking about a theoretical problem, the way you describe the effect of key-groups
on snapshotting in the {{RocksDBKeyedStateBackend}} is not accurate. What was actually changed
for key-groups in the current master is that each key in RocksDB is now prefixed by it's corresponding
key-group ID (1-2 byte, depending on maxParallelism). This allows us to iterate all key-value
pairs in key-grouped order at snapshot time, similar to how we previously iterated them in
key-order. Furthermore, all key-groups go to the same file and are written consecutively without
random IOs, and a small index of key-group-id -> offset is created. From this point of
view, the performance impact of key-groups on snapshot time should (hopefully) be marginal.

As another remark, non-partitioned state is snapshotted/restored in a similar way. Avoiding
key-groups at snapshot times has also more problematic implications for restores than you
consider in the description. For example, if the new parallelism is not a multiple of the
old parallelism, we effectively have to read and filter the completed keyed state for each
task that works on it.

> Perform the grouping of keys in restoring instead of checkpointing
> ------------------------------------------------------------------
>                 Key: FLINK-5036
>                 URL: https://issues.apache.org/jira/browse/FLINK-5036
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
> Whenever taking snapshots of {{RocksDBKeyedStateBackend}}, the values in the states will
be written onto different files according to their key groups. The procedure is very costly
when the states are very big. 
> Given that the snapshot operations will be performed much more frequently than restoring,
we can leave the key groups as they are to improve the overall performance. In other words,
we can perform the grouping of keys in restoring instead of in checkpointing.
> I think, the implementation will be very similar to the restoring of non-partitioned
states. Each task will receive a collection of snapshots each of which contains a set of key
groups. Each task will restore its states from the given snapshots by picking values in assigned
key groups.

This message was sent by Atlassian JIRA

View raw message