cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7282) Faster Memtable map
Date Mon, 09 Jul 2018 18:55:00 GMT


Benedict commented on CASSANDRA-7282:

bq. but without Value Types that's a pretty long shot with Java

Well, you can do n-ary (i.e. B-Tree) equivalents of the Van Emde Boas layout; or you could
do a simpler binary Van Emde Boas initial search (sized to about a cache fetch unit), to select
the correct range of a sorted array to search, which would probably yield similar results
for our domain sizes.  

But to be honest I only really put the binary Van Emde Boas approach in here to save time,
as it was faster than digging too closely into the differences in the benchmark between the
two more basic approaches.

bq. I'll start the other ticket by moving the range detection to an upper level by creating
a new Memtable per range

Great.  Thanks

bq.  I'll need to find a good way from getting that event from bootstrap & streaming 

So thinking about this a bit more, it seems there are four options:

1) Locate all the places the token ring changes, insert some code
2) Subscribe to changes in the token ring, and flush if we introduce a new range (or expand
an existing range)
3) Detect an out-of-range message, and synchronise our ranges with the ring
4) Detect an out-of-range message, and simply route it to an out-of-range bucket

The thing is, there's nothing right now preventing us receiving records for a range not assigned
to us.  So we will need an out-of-range bucket.  But at the same time, we will have to update
ourselves, consciously, when the ring updates.  If we're moving up out of the memtable to
impose this, we cannot rely on flush to resynchronise ourselves.

So, probably we will need an out-of-range bucket anyway - that IMO should be served by a CSLM
until we implement NBHOM degrading to a skip list under skew.

But we also want to know when our owned ranges change - and this can happen for a variety
of circumstances.  Probably the most robust thing to do, is to subscribe to changes in TokenMetadata.
 Which isn't currently possible, but shouldn't be onerous to add - though I hate adding to
an already ugly class.

bq. Also, any table not using ranges (such as system, maybe some new virtual ones?) don't
use any.

Those tables that don't use any ranges all use the LocalStrategy for replication, which right
now looks like it would yield a separate range for every token in the cluster, all mapping
to localhost.  We should probably confirm this is the case, and override the method to just
return a single range owned by localhost.

We should probably take this discussion to another ticket though.

> Faster Memtable map
> -------------------
>                 Key: CASSANDRA-7282
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Michael Burman
>            Priority: Major
>              Labels: performance
>             Fix For: 4.x
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, run1.svg, writes.svg
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our
memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use
a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The
list would impose the normal order on the collection, but a hash index would live alongside
as part of the same data structure, simply mapping into the list and permitting O(1) lookups
and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we
can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes
to be checked at once,  further reducing the constant factor costs for lookups.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message