cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Corentin Chary (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-13038) 33% of compaction time spent in StreamingHistogram.update()
Date Tue, 13 Dec 2016 09:21:58 GMT
Corentin Chary created CASSANDRA-13038:

             Summary: 33% of compaction time spent in StreamingHistogram.update()
                 Key: CASSANDRA-13038
             Project: Cassandra
          Issue Type: Bug
          Components: Compaction
            Reporter: Corentin Chary
         Attachments: compaction-streaminghistrogram.png, profiler-snapshot.nps

With the following table, that contains a *lot* of cells: 

CREATE TABLE biggraphite.datapoints_11520p_60s (
    metric uuid,
    time_start_ms bigint,
    offset smallint,
    count int,
    value double,
    PRIMARY KEY ((metric, time_start_ms), offset)
AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold':

Keyspace : biggraphite
        Read Count: 1822
        Read Latency: 1.8870054884742042 ms.
        Write Count: 2212271647
        Write Latency: 0.027705127678653473 ms.
        Pending Flushes: 0
                Table: datapoints_11520p_60s
                SSTable count: 47
                Space used (live): 300417555945
                Space used (total): 303147395017
                Space used by snapshots (total): 0
                Off heap memory used (total): 207453042
                SSTable Compression Ratio: 0.4955200053039823
                Number of keys (estimate): 16343723
                Memtable cell count: 220576
                Memtable data size: 17115128
                Memtable off heap memory used: 0
                Memtable switch count: 2872
                Local read count: 0
                Local read latency: NaN ms
                Local write count: 1103167888
                Local write latency: 0.025 ms
                Pending flushes: 0
                Percent repaired: 0.0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 105118296
                Bloom filter off heap memory used: 106547192
                Index summary off heap memory used: 27730962
                Compression metadata off heap memory used: 73174888
                Compacted partition minimum bytes: 61
                Compacted partition maximum bytes: 51012
                Compacted partition mean bytes: 7899
                Average live cells per slice (last five minutes): NaN
                Maximum live cells per slice (last five minutes): 0
                Average tombstones per slice (last five minutes): NaN
                Maximum tombstones per slice (last five minutes): 0
                Dropped Mutations: 0

It looks like a good chunk of the compaction time is lost in StreamingHistogram.update() (which
is used to store the estimated tombstone drop times).

This could be caused by a huge number of different deletion times which would makes the bin
huge but it this histogram should be capped to 100 keys. It's more likely caused by the huge
number of cells.

A simple solutions could be to only take into accounts part of the cells, the fact the this
table has a TWCS also gives us an additional hint that sampling deletion times would be fine.

This message was sent by Atlassian JIRA

View raw message