cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7282) Faster Memtable map
Date Thu, 05 Jul 2018 00:52:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533153#comment-16533153
] 

Benedict commented on CASSANDRA-7282:
-------------------------------------

I actually wonder if we should make this ticket depend on a new ticket to separate all memtables
by token range.  By doing this at a slightly higher level, we can flush data for each token
range on an independent schedule.  This might permit more even use our disks, and reduce
overall compaction work by permitting us to flush larger sstables on average.

It's also an easier level to integrate decisions about token range changes, and avoids the
unsightly slicing (on flush) of already sliced data.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Michael Burman
>            Priority: Major
>              Labels: performance
>             Fix For: 4.x
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our
memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use
a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The
list would impose the normal order on the collection, but a hash index would live alongside
as part of the same data structure, simply mapping into the list and permitting O(1) lookups
and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we
can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes
to be checked at once,  further reducing the constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message