cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Manes (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
Date Sat, 16 Apr 2016 19:46:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244357#comment-15244357
] 

Ben Manes edited comment on CASSANDRA-11452 at 4/16/16 7:46 PM:
----------------------------------------------------------------

CLHM was always a decorator, but in 1.4 it embedded the CHMv8 backport. We did that to help
improve performance for very large caches, like Cassandra's were, since JDK8 took a long time.
That's probably what your remembering.

I agree that reducing per-entry overhead is attractive, though a [rough calculation|https://github.com/ben-manes/caffeine/wiki/Memory-overhead]
indicates it isn't a huge savings. My view is that it is a premature optimization and best
left to the end after the implementation has matured, to re-evaluate if the impact is worth
attempting a direct rewrite. Otherwise it adds greatly to the complexity budget from the get
go and leading to less time focused on the unique problems of the domain (API, features, efficiency).
For example there is more space savings by using TinyLFU over LIRS's ghost entries, but evaluating
took effort that I might have been to overwhelmed to expend. It would also be interesting
to see if pairing with [Apache Mnemonic|https://github.com/apache/incubator-mnemonic] could
reduce the GC overhead by having off-heap without the serialization penalty.

bq. Just to clarify those numbers are for small workloads?

Yep.

bq. ...it would still leave the gate open for an attacker to reduce the efficacy of the cache
for items that have only moderate reuse likelihood.

Since the frequency is reduced by half every sample period, my assumption was that this attack
would be very difficult. Gil's response was to instead detect if TinyLFU had a large number
of consecutive rejections, e.g. 80 (assuming 1:20 is admitted on average). That worked quite
well, except on ARC's database trace (ds1) which had a negative impact. It makes sense that
scans (db, analytics) will have a high rejection rate. What do you think about combining the
approach, e.g. {{(candidateFreq <= 3) || (++unadmittedItems < 80)}}, as a guard prior
to performing a 1% random admittance?


was (Author: ben.manes):
CLHM was always a decorator, but in 1.4 it embedded the CHMv8 backport. We did that to help
improve performance for very large caches, like Cassandra's were, since JDK8 took a long time.
That's probably what your remembering.

I agree that reducing per-entry overhead is attractive, though a [rough calculation|https://github.com/ben-manes/caffeine/wiki/Memory-overhead]
indicates it isn't a huge savings. My view is that it is a premature optimization and best
left to the end after the implementation has matured, to re-evaluate if the impact is worth
attempting a direct rewrite. Otherwise it adds greatly to the complexity budget from the get
go and leading to less time focused on the unique problems of the domain (API, features, efficiency).
For example there is more space savings by using TinyLFU over LIRS's ghost entries, but evaluating
took effort that I might have been to overwhelmed to expend. It would also be interesting
to see if pairing with [Apache Mnemonic|https://github.com/apache/incubator-mnemonic] could
reduce the GC overhead by having off-heap without the serialization penalty.

bq. Just to clarify those numbers are for small workloads?

Yep.

bq ...it would still leave the gate open for an attacker to reduce the efficacy of the cache
for items that have only moderate reuse likelihood.

Since the frequency is reduced by half every sample period, my assumption was that this attack
would be very difficult. Gil's response was to instead detect if TinyLFU had a large number
of consecutive rejections, e.g. 80 (assuming 1:20 is admitted on average). That worked quite
well, except on ARC's database trace (ds1) which had a negative impact. It makes sense that
scans (db, analytics) will have a high rejection rate. What do you think about combining the
approach, e.g. {{(candidateFreq <= 3) || (++unadmittedItems < 80)}}, as a guard prior
to performing a 1% random admittance?

> Cache implementation using LIRS eviction for in-process page cache
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-11452
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid having to
explicitly marking compaction accesses as non-cacheable, we need a cache implementation that
uses an eviction algorithm that can better handle non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message