cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Tue, 02 Dec 2014 15:28:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231627#comment-14231627
] 

Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

Robert I don't seem to be getting the latest code for your work on master? For instance the
key comparison code does 8 bytes at a time and doesn't handle trailing bytes as far as I can
tell.

To Vijay's point. A pseudo-random test against the map that does say 200 million operations
against a keyspace of several million entries and mirrors the operations on a regular hash
map and checks they have the same contents periodically would be helpful in having some confidence
in the map. Size it so the LRU doesn't do anything. Print the seed at the beginning of the
test so it can be reproduced. I think this basically duplicates the benchmark, but having
it as a unit test is nice. We can tune the number of operations and keys down for running
in CI. You could also look a the unit tests for Guava's cache or j.u.HashMap and borrow those.
Nice thing about data structure APIs is that the tests already exist.

bq. Yes, basically from JDK. Could not get that via inheritance.
What are the licensing and attribution requirements for that code?

bq. IMO hash code should be 64 bits because 32 bits might not be sufficient.
[~benedict] might have some opinions on how to get the best bits out of MurmurHash3. 32 bits
is 256-512 gigabytes of cache for 128 byte entries which is not bad. I don't feel strongly
either way since I don't know whether callers will have the hash precomputed.

bq. Nope - would not be. But it's 2^27 (limited by a stupid constant used for both max# of
segments and max# of buckets). Worth taking a look at it - it's weird, yes.
In OffHeapMap line 222 it seems to have a gate preventing rehashing to > 2 ^ 24 buckets.

bq. (Hope I caught all of your comments)
I'll check them once you update.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message