cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Urbański (JIRA) <>
Subject [jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
Date Thu, 19 Jan 2017 17:03:26 GMT


Jan Urbański commented on CASSANDRA-13123:

[~jasobrown] I haven't had the chance to try this out in production yet, I'll try to do that
tomorrow. The initial commitlog replay takes up to two minutes for each of our nodes right
now and if I understand correctly, after a drain all commitlogs except for at most two would
be deleted, so the initial replay phase would be reduced to essentially zero. The shutdown
phase might take a bit longer, because it'll have to wait for those commitlogs to be deleted,
of course.

The exact improvement depends on the number of CLs left behind after a drain - on machines
with heavily contended disks it can be a lot, on lightly loaded ones it might be 0.

As to when we're doing drains, it's on every restart (it's part of the restart procedure that
we have).

> Draining a node might fail to delete all inactive commitlogs
> ------------------------------------------------------------
>                 Key: CASSANDRA-13123
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Jan Urbański
>            Assignee: Jan Urbański
>             Fix For: 3.8
>         Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 13123-trunk.txt
> After issuing a drain command, it's possible that not all of the inactive commitlogs
are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down the CommitLogSegmentManager.
This has the effect of discarding any pending management tasks it might have, like the removal
of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind after a drain
and a lengthy recovery after a restart. With a fleet of dozens of nodes, each of them leaving
several GB of commitlogs after a drain and taking up to two minutes to recover them on restart,
the additional time required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done in CASSANDRA-8844.

This message was sent by Atlassian JIRA

View raw message