cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Thu, 18 Dec 2014 11:31:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251537#comment-14251537
] 

Robert Stupp commented on CASSANDRA-7438:
-----------------------------------------

I’ve nearly finished the OHC implementation. Unit tests cover all functionality required
by C* and a separate test-only implementation is now used to verify the implementation (entry
(de)serialization is not extensively covered by the tests, yet). OHC interface is changed
towards the functionality required by C*.

Maven executes the unit tests both with and without jemalloc (only if jemalloc is installed,
of course).

[~aweisberg], [~benedict] can you have a look at the current OHC code?

I’d like to know how it could/should be integrated in C*. IMO there are two decisions to
be made:
* Whether to migrate whole OHC code into org.apache.cassandra codebase (with the option to
either turn it on or off).
* Whether to implement a “pluggable row cache“ (to allow multiple implementations)

I've got some ideas regarding row cache which are out of scope of this ticket:
* New per-table knob to enable whether to populate entries to the row cache on reads+writes
or just on reads (to target different workloads)
* Rethink about whether to keep the current {{RowCacheSentinel}} implementation as is - if
I understand it correctly, it just reduces the number of cache-put operations (cache hit on
a sentinel performs a disk read). A compromise regarding additional serialization cost?
* Improvement of key (de)serialization (saving the row cache to disk) - use direct I/O
* Optimizations of value deserialization effort - let C* directly access a cached row in off-heap
memory instead of the deserialization (and on-heap object construction) overhead.

Note: although the jemalloc allocator provides a {{getTotalAllocated()}} method, the result
is not correct and I don't know why. The result depends on jemalloc configure settings ({{--en/disable-tcache}}).
According to the man-page the result should be correct (sum of {{stats.allocated}} and {{stats.huge.allocated}}),
but it isn't (verified with a "coded memory leak of small allocations" that didn't increase
the value). Iterating over the jemalloc _arenas_ and _bins_ does not help since the two mentioned
values are aggregations of these.


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message