cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Wed, 31 Dec 2014 23:18:16 GMT


Ariel Weisberg commented on CASSANDRA-7438:

bq. Whether to migrate whole OHC code into org.apache.cassandra codebase (with the option
to either turn it on or off).
I am open to either. I asked Benedict and he prefers having it inside C* so we can patch it.
The advantage of having it outside is that it might see use elsewhere and get additional eyes/contributions.
You could start with it outside and publish to maven central and if there an issue getting
patches applied quickly we can always fork it in C*.

bq. Whether to implement a “pluggable row cache“ (to allow multiple implementations)
I think that we aren't going to need multiple cache implementations in the long run. Seems
like we should be able to have on that can be configured to have the desired behavior. Benedict
doesn't feel strongly about it either. If Vijay wants to continue working on another implementation
then we would want to keep it pluggable the way it currently is.

It looks like the KeyCache and CounterCache both use a different implementation and not SerializingCache.
I am not clear on why they don’t use serializing cache. It's worth evaluating why that is
before converging on a single implementation.

bq. New per-table knob to enable whether to populate entries to the row cache on reads+writes
or just on reads (to target different workloads)
Sounds like it would be useful, but first we have to come up with someone somewhere that says
I want this, or a workload where this is the right call. There may also be correctness issues
to think about see next item.

bq. Rethink about whether to keep the current RowCacheSentinel implementation as is - if I
understand it correctly, it just reduces the number of cache-put operations (cache hit on
a sentinel performs a disk read). A compromise regarding additional serialization cost?
I think it is for correctness?
I'm still reading up on this.

bq. Improvement of key (de)serialization (saving the row cache to disk) - use direct I/O
There is some trickiness here because the AutoSavingCache breaks apart the keys to determine
where the data goes.
bq. Optimizations of value deserialization effort - let C* directly access a cached row in
off-heap memory instead of the deserialization (and on-heap object construction) overhead.
I think these two together would make a good follow up ticket. Another good follow up ticket
would be addressing the allocator for performance and for fragmentation.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>                 Key: CASSANDRA-7438
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>         Attachments: 0001-CASSANDRA-7438.patch,
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.

This message was sent by Atlassian JIRA

View raw message