cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Shook (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting
Date Sun, 14 Dec 2014 20:16:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246079#comment-14246079
] 

Jonathan Shook edited comment on CASSANDRA-8371 at 12/14/14 8:16 PM:
---------------------------------------------------------------------

[~Bj0rn],
The phrase "ideal scheduling" was meant to describe the case in which the data for each sstable
is compacted exactly once per window. In other words, there is only one coalescing compaction
needed for all data in the new interval once a set of smaller intervals is grouped into a
single larger interval. You describe some of the scenarios which make this more of an ideal
than an actuality in your response above. I understand that the windows are anchored at fixed
points using modulo against the timestamp. The rationale I used above actually depends on
it as an assumption, otherwise you wouldn't be able to achieve ideal compaction scheduling
of "once per interval".

I guess we need to be careful about the terms we use here. I'd favor "fixed intervals" and
"coalescing of fixed intervals". I believe my rationale on compaction load still makes sense,
unless someone has a counter-example or clarification.





was (Author: jshook):
[~Bj0rn],
The phrase "ideal scheduling" was meant to describe the case in which the data for each sstable
is compacted exactly once per window. In other words, there is only one coalescing compaction
needed for all data once a set of smaller intervals is grouped into a single larger interval.
You describe some of the scenarios which make this more of an ideal than an actuality in your
response above. I understand that the windows are anchored at fixed points using modulo against
the timestamp. The rationale I used above actually depends on it as an assumption, otherwise
you wouldn't be able to achieve ideal compaction scheduling of "once per interval".

I guess we need to be careful about the terms we use here. I'd favor "fixed intervals" and
"coalescing of fixed intervals". I believe my rationale on compaction load still makes sense,
unless someone has a counter-example or clarification.




> DateTieredCompactionStrategy is always compacting 
> --------------------------------------------------
>
>                 Key: CASSANDRA-8371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: mck
>            Assignee: Björn Hegerfors
>              Labels: compaction, performance
>         Attachments: java_gc_counts_rate-month.png, read-latency-recommenders-adview.png,
read-latency.png, sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png
>
>
> Running 2.0.11 and having switched a table to [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602]
we've seen that disk IO and gc count increase, along with the number of reads happening in
the "compaction" hump of cfhistograms.
> Data, and generally performance, looks good, but compactions are always happening, and
pending compactions are building up.
> The schema for this is 
> {code}CREATE TABLE search (
>   loginid text,
>   searchid timeuuid,
>   description text,
>   searchkey text,
>   searchurl text,
>   PRIMARY KEY ((loginid), searchid)
> );{code}
> We're sitting on about 82G (per replica) across 6 nodes in 4 DCs.
> CQL executed against this keyspace, and traffic patterns, can be seen in slides 7+8 of
https://prezi.com/b9-aj6p2esft/
> Attached are sstables-per-read and read-latency graphs from cfhistograms, and screenshots
of our munin graphs as we have gone from STCS, to LCS (week ~44), to DTCS (week ~46).
> These screenshots are also found in the prezi on slides 9-11.
> [~pmcfadin], [~Bj0rn], 
> Can this be a consequence of occasional deleted rows, as is described under (3) in the
description of CASSANDRA-6602 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message