cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Blake Eggleston (Jira)" <>
Subject [jira] [Commented] (CASSANDRA-15367) Memtable memory allocations may deadlock
Date Wed, 15 Jan 2020 23:15:00 GMT


Blake Eggleston commented on CASSANDRA-15367:

I've been trying to work out exactly how this deadlock can occur, based on your description.
Could the deadlock be restated like this?

 For a given partition key:
 * a write is part of an OpGroup before a barrier set on Memtable1 (M1), but with a replay
position after the final replay position set on M1 before it flushes.
 * So it’s forwarded to M2, while still blocking flushes on M1
 * M2 has another in flight write for this partition, it’s contended, so it’s holding
the lock
 ** It can’t progress because it can’t allocate memory (in part because M1 can’t flush)
 ** It doesn’t degrade to allocating on heap it’s oporder isn’t blocking anything.
 * The write stage becomes saturated with deadlocked writes like these, no more writes



> Memtable memory allocations may deadlock
> ----------------------------------------
>                 Key: CASSANDRA-15367
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log, Local/Memtable
>            Reporter: Benedict Elliott Smith
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
> * Under heavy contention, we guard modifications to a partition with a mutex, for the
lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before their flush
> * Memtables permit operations from this cohort to fall-through to the following Memtable,
in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, since they
block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new Memtable’s
cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, may fall-through
to the next Memtable
> * The operations from C1 may execute after the above is encountered by those from C2

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message