cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7282) Faster Memtable map
Date Thu, 05 Jul 2018 16:44:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533884#comment-16533884
] 

Benedict commented on CASSANDRA-7282:
-------------------------------------

Great, thanks.  The mismatch in expectations is probably because this is not the optimal
way to implement a binary search variant.

Ideally it would have two (or three) backing arrays, as opposed to a backing array of nodes.
 One primitive array of one side of bounds (e.g. lower bounds) to search on; another of
keys, and another primitive array of the opposing bounds to filter results (or if the bounds
are encoded in the keys already, that could be skipped).

Do you mind adding that variant to the comparison?

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Michael Burman
>            Priority: Major
>              Labels: performance
>             Fix For: 4.x
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our
memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use
a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The
list would impose the normal order on the collection, but a hash index would live alongside
as part of the same data structure, simply mapping into the list and permitting O(1) lookups
and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we
can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes
to be checked at once,  further reducing the constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message