cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13038) 33% of compaction time spent in StreamingHistogram.update()
Date Fri, 06 Jan 2017 11:15:59 GMT


Benedict commented on CASSANDRA-13038:

So, to clarify, I was suggesting using a primitive sorted array for the histogram, and another
primitive array buffer of some size >= the histogram (perhaps many multiples of).  When
the buffer overflows (or the end is reached), sort the buffer, perform a linear merge with
the existing histogram to select the "largest" N buckets (or just most evenly distributed
ones, which is slightly different in approach, but probably better), then linear merge again
to re-populate the histogram.  

There's also the possibility of simply maintaining a priority queue of pairs of delta and
their histogram bucket value (sorted on the delta only).  If we calculate the delta for each
item inserted (with its neighbours) and it is smaller than the minimum delta in the queue,
we perform an O(1) merge instead of an insertion; otherwise we remove the head of the queue,
and remove its matching item in the histogram.  This is algorithmically less efficient, and
maybe has worse constant factors (hard to say, as fewer rounds, but worse memory behaviour),
but is a much simpler change.

> 33% of compaction time spent in StreamingHistogram.update()
> -----------------------------------------------------------
>                 Key: CASSANDRA-13038
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>         Attachments: compaction-speedup.patch, compaction-streaminghistrogram.png, profiler-snapshot.nps
> With the following table, that contains a *lot* of cells: 
> {code}
> CREATE TABLE biggraphite.datapoints_11520p_60s (
>     metric uuid,
>     time_start_ms bigint,
>     offset smallint,
>     count int,
>     value double,
>     PRIMARY KEY ((metric, time_start_ms), offset)
> AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 'max_threshold': '32', 'min_threshold':
> Keyspace : biggraphite
>         Read Count: 1822
>         Read Latency: 1.8870054884742042 ms.
>         Write Count: 2212271647
>         Write Latency: 0.027705127678653473 ms.
>         Pending Flushes: 0
>                 Table: datapoints_11520p_60s
>                 SSTable count: 47
>                 Space used (live): 300417555945
>                 Space used (total): 303147395017
>                 Space used by snapshots (total): 0
>                 Off heap memory used (total): 207453042
>                 SSTable Compression Ratio: 0.4955200053039823
>                 Number of keys (estimate): 16343723
>                 Memtable cell count: 220576
>                 Memtable data size: 17115128
>                 Memtable off heap memory used: 0
>                 Memtable switch count: 2872
>                 Local read count: 0
>                 Local read latency: NaN ms
>                 Local write count: 1103167888
>                 Local write latency: 0.025 ms
>                 Pending flushes: 0
>                 Percent repaired: 0.0
>                 Bloom filter false positives: 0
>                 Bloom filter false ratio: 0.00000
>                 Bloom filter space used: 105118296
>                 Bloom filter off heap memory used: 106547192
>                 Index summary off heap memory used: 27730962
>                 Compression metadata off heap memory used: 73174888
>                 Compacted partition minimum bytes: 61
>                 Compacted partition maximum bytes: 51012
>                 Compacted partition mean bytes: 7899
>                 Average live cells per slice (last five minutes): NaN
>                 Maximum live cells per slice (last five minutes): 0
>                 Average tombstones per slice (last five minutes): NaN
>                 Maximum tombstones per slice (last five minutes): 0
>                 Dropped Mutations: 0
> {code}
> It looks like a good chunk of the compaction time is lost in StreamingHistogram.update()
(which is used to store the estimated tombstone drop times).
> This could be caused by a huge number of different deletion times which would makes the
bin huge but it this histogram should be capped to 100 keys. It's more likely caused by the
huge number of cells.
> A simple solutions could be to only take into accounts part of the cells, the fact the
this table has a TWCS also gives us an additional hint that sampling deletion times would
be fine.

This message was sent by Atlassian JIRA

View raw message