flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flink Jira Bot (Jira)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-21726) Fix checkpoint stuck
Date Sun, 30 May 2021 23:16:01 GMT

     [ https://issues.apache.org/jira/browse/FLINK-21726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flink Jira Bot updated FLINK-21726:
-----------------------------------
    Labels: auto-deprioritized-critical stale-major  (was: auto-deprioritized-critical)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community
manage its development. I see this issues has been marked as Major but is unassigned and neither
itself nor its Sub-Tasks have been updated for 30 days. I have gone ahead and added a "stale-major"
to the issue". If this ticket is a Major, please either assign yourself or give an update.
Afterwards, please remove the label or in 7 days the issue will be deprioritized.


> Fix checkpoint stuck
> --------------------
>
>                 Key: FLINK-21726
>                 URL: https://issues.apache.org/jira/browse/FLINK-21726
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.11.3, 1.12.2, 1.13.0
>            Reporter: fanrui
>            Priority: Major
>              Labels: auto-deprioritized-critical, stale-major
>             Fix For: 1.14.0
>
>
> h1. 1. Bug description:
> When RocksDB Checkpoint, it may be stuck in `WaitUntilFlushWouldNotStallWrites` method.
> h1. 2. Simple analysis of the reasons:
> h2. 2.1 Configuration parameters:
>  
> {code:java}
> # Flink yaml:
> state.backend.rocksdb.predefined-options: SPINNING_DISK_OPTIMIZED_HIGH_MEM
> state.backend.rocksdb.compaction.style: UNIVERSAL
> # corresponding RocksDB config
> Compaction Style : Universal 
> max_write_buffer_number : 4
> min_write_buffer_number_to_merge : 3{code}
> Checkpoint is usually very fast. When the Checkpoint is executed, `WaitUntilFlushWouldNotStallWrites`
is called. If there are 2 Immutable MemTables, which are less than `min_write_buffer_number_to_merge`,
they will not be flushed. But will enter this code.
>  
> {code:java}
> // method: GetWriteStallConditionAndCause
> if (mutable_cf_options.max_write_buffer_number> 3 &&
>               num_unflushed_memtables >=
>                   mutable_cf_options.max_write_buffer_number-1) {
>      return {WriteStallCondition::kDelayed, WriteStallCause::kMemtableLimit};
> }
> {code}
> code link: [https://github.com/facebook/rocksdb/blob/fbed72f03c3d9e4fdca3e5993587ef2559ba6ab9/db/column_family.cc#L847]
> Checkpoint thought there was a FlushJob, but it didn't. So will always wait.
> h2. 2.2 solution:
> Increase the restriction: the `number of Immutable MemTable` >= `min_write_buffer_number_to_merge
will wait`.
> The rocksdb community has fixed this bug, link: [https://github.com/facebook/rocksdb/pull/7921]
> h2. 2.3 Code that can reproduce the bug:
> [https://github.com/1996fanrui/fanrui-learning/blob/flink-1.12/module-java/src/main/java/com/dream/rocksdb/RocksDBCheckpointStuck.java]
> h1. 3. Interesting point
> This bug will be triggered only when `the number of sorted runs >= level0_file_num_compaction_trigger`.
> Because there is a break in WaitUntilFlushWouldNotStallWrites.
> {code:java}
> if (cfd->imm()->NumNotFlushed() <
>         cfd->ioptions()->min_write_buffer_number_to_merge &&
>     vstorage->l0_delay_trigger_count() <
>         mutable_cf_options.level0_file_num_compaction_trigger) {
>   break;
> }
> {code}
> code link: [https://github.com/facebook/rocksdb/blob/fbed72f03c3d9e4fdca3e5993587ef2559ba6ab9/db/db_impl/db_impl_compaction_flush.cc#L1974]
> Universal may have `l0_delay_trigger_count() >= level0_file_num_compaction_trigger`,
so this bug is triggered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message