cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Assigned] (CASSANDRA-14605) Major compaction of LCS tables very slow
Date Fri, 27 Jul 2018 12:06:00 GMT


Benedict reassigned CASSANDRA-14605:

    Assignee: Benedict

> Major compaction of LCS tables very slow
> ----------------------------------------
>                 Key: CASSANDRA-14605
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>         Environment: AWS, i3.4xlarge instance (very fast local nvme storage), Linux 4.13
> Cassandra 3.0.16
>            Reporter: Joseph Lynch
>            Assignee: Benedict
>            Priority: Minor
>              Labels: lcs, performance
>         Attachments: slow_major_compaction_lcs.svg
> We've recently started deploying 3.0.16 more heavily in production and today I noticed
that full compaction of LCS tables takes a much longer time than it should. In particular
it appears to be faster to convert a large dataset to STCS, run full compaction, and then
convert it to LCS (with re-leveling) than it is to just run full compaction on LCS (with re-leveling).
> I was able to get a CPU flame graph showing 50% of the major compaction's cpu time being
spent in [{{SSTableRewriter::maybeReopenEarly}}|]
calling [{{SSTableRewriter::moveStarts}}|].
> I've attached the flame graph here which was generated by running Cassandra using {{-XX:+PreserveFramePointer}},
then using jstack to get the compaction native thread id (nid) which I then used perf to get
on cpu time:
> {noformat}
> perf record -t <compaction thread> -o <output file> -F 49 -g sleep 60 >/dev/null
> {noformat}
> I took this data and collapsed it using the steps talked about in [Brendan Gregg's java
in flames blogpost|] (Instructions
section) to generate the graph.
> The results are that at least on this dataset (700GB of data compressed, 2.2TB uncompressed),
we are spending 50% of our cpu time in {{moveStarts}} and I am unsure that we need to be doing
that as frequently as we are. I'll see if I can come up with a clean reproduction to confirm
if it's a general problem or just on this particular dataset.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message