cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Wed, 03 Dec 2014 09:57:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232833#comment-14232833
] 

Benedict commented on CASSANDRA-7438:
-------------------------------------

re: hash bits:

there's not really a dramatic benefit to using more than 32-bits. We will always use the upper
bits for the segment and the lower bits for the bucket, for which 4B items is plenty, although
we don't have proper entropy for all the bits; we may have only 28-bits of good collision
free-ness; we will want to rehash the murmur hash to ensure this is spread evenly to avoid
a grow boundary consistently failing to reduce collisions. 

The one advantage of having some spare hash bits is that we can use these to avoid running
a potentially expensive comparison on a large key until high confidence we've found the correct
item - and as the number of unused hash bits for indexing dwindle, the value of this goes
up. But the number of instances where this helps will be vanishingly small, since the head
of the key will be on the same cache line and a hash collision and key prefix collision is
pretty unlikely. It might be more significant if we were to use open-address hashing, as we
would have excellent locality and reduce the number of expected cache misses for a lookup.
But this won't be measurable above the cache serialization costs. We do already have these
hash bits calculated in c*, typically. We also are unlikely to notice the overhead - allocations
are likely to have ~16 bytes of overhead, be padded to the nearest 8 or 16 bytes, and a row
has a lot of bumpf to encode. I doubt there will be any variation in storage costs from using
all 64 bits.

i.e., whatever floats your boat

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message