cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-2191) Multithread across compaction buckets
Date Mon, 11 Apr 2011 06:25:05 GMT


Stu Hood updated CASSANDRA-2191:

    Attachment: 0006-Prevent-cache-saves-from-occuring-concurrently.txt

* Inlined stopTheWorld in 0005. Yes, I agree that the name sucked, but whether or not it is
possible for a lock acquisition to fail on a server that is not already screwed, and whether
an abstraction is in order here is still up for debate
* Removed the 'forceMajor' parameter: will open a ticket post-commit to allow for guaranteeing
that a manually triggered compaction is major
* Moved ksname/cfname into getters. I didn't do this initially because the CFS is sometimes
null, but I guess you'd get the NPE in either case
* Added an AtomicBoolean to AutoSavingCache in 0006. I reeeally think this should go to the
flush stage, since the tasks have almost identical lifetimes, and we don't really need progress
for either of them
* Wrapped the IdentityHashMap into an IdentityHashSet
* Returned printCompactionStats to its former glory
* Removed OperationType from SSTableWriter.Builder's task type

Thanks! CASSANDRA-2156 has been rebased as well.

> Multithread across compaction buckets
> -------------------------------------
>                 Key: CASSANDRA-2191
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>         Attachments: 0001-Add-a-compacting-set-to-DataTracker.txt, 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt,
0003-Expose-multiple-compactions-via-JMX-and-a-concrete-ser.txt, 0004-Allow-multithread-compaction-to-be-disabled.txt,
0005-Acquire-the-writeLock-for-major-cleanup-scrub-in-order.txt, 0006-Prevent-cache-saves-from-occuring-concurrently.txt
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and reasoning
are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of sstables that
existed the moment the compaction started. This means that for longer running compactions
(even when running as fast as possible on the hardware), a very large number of new sstables
might be created in the meantime. We have observed this proliferation of sstables killing
performance during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing larger files)
when compactions in lower buckets become possible. While this would likely solve the problem
with read performance, it does not actually help us perform compaction any faster, which is
a reasonable requirement for other situations.
> Instead, we need to be able to perform any compactions that are currently required in
parallel, independent of what bucket they might be in.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message