cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Tue, 09 Dec 2014 17:39:14 GMT


Ariel Weisberg commented on CASSANDRA-7438:

Lot's of cool stuff here Robert.

Unit test wise there is a lot of code that is only covered indirectly or not at all and the
behaviors are not checked for explicitly. I don't think it makes sense to include code that
doesn't doesn't have a unit test claiming it does what is says it does. The various input/output
streams, buffering, and compression all need units tests. needs a unit test for pretty
much every method as well as the validation functionality. HashEntry has a bunch of uncovered
functions. For me the lack of test coverage is the biggest barrier.

What I am reacting to is that the tests are black box and miss things. OffHeapMap containsEntry
has no tests. removeEntry has untested code. removeLink still has untested code. There is
untested histogram stuff, deserializeEntry, serializeEntry. HashEntry classes have untested
functions. HashEntries has many predicates that are untested.

Having a unit test that fuzzes against a parallel implementation at the same time using a
different LRU map implementation would be great for a black box test. You can stripe the other
implementation the same way so that the eviction matches.

One of my previous comments was that SegmentCacheImpl duplicates reference counting code from
OffHeapMap and should just delegate. It ends up doing that anyways.

I would really like to see the cleanup/eviction code go away. If inserting an entry would
blow capacity remove entries until it doesn't. I don't see a reason to monkey with thresholds.

At some point the existing C* cache interface needs to gel with your work. Right now C* uses
the hotN and getKeys interface to return the contents of the cache for persistence. I think
the path of least resistance to start would be to implement the existing interface and then
come back and look at how to get compression and more efficient IO into all the implementations.
The existing stuff in C* doesn't do compression and doesn't buffer its IO. I would prefer
to minimize major changes to the existing C* code. I want to get it working and then iterate
further for other improvements like more efficient cache serialization.

You could change the OHC interface or implement an adapter. I think it's fine to modify ICache
to return iterables or iterators instead of a collections to incrementally produce key set
and hot keys. For everything else I would really like to see things to stay the same unless
there is something to be gained by changing the interface.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>                 Key: CASSANDRA-7438
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>         Attachments: 0001-CASSANDRA-7438.patch,
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.

This message was sent by Atlassian JIRA

View raw message