lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Manes (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-10141) Caffeine cache causes BlockCache corruption
Date Sat, 18 Feb 2017 05:08:44 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872983#comment-15872983
] 

Ben Manes edited comment on SOLR-10141 at 2/18/17 5:08 AM:
-----------------------------------------------------------

Thanks!!! I think I found the bug. It now passes your test case.

The problem was due to put() stampeding over the value during the eviction. The [eviction
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
#* resurrected if a race occurred (e.g. was thought expired, but newly accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener

This failed because [putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
can perform its update outside of a hash table lock (e.g. a computation). It synchronizes
on the entry to update, checking first if it was still alive. This resulted in a race where
the entry was removed from the hash table, the value updated, and entry marked as dead. When
the listener was notified, it received the wrong value.

The solution I have now is to expand the synchronized block on eviction. This passes your
test and should be cheap. I'd like to review it a little more and incorporate your test into
my suite.

This is an excellent find. I've stared at the code many times and the race seems obvious in
hindsight.


was (Author: ben.manes):
Thanks!!! I think I found the bug. It now passes your test case.

The problem was due to put() stampeding over the value during the eviction. The [eviction
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
   - resurrected if a race occurred (e.g. was thought expired, but newly accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener

This failed because [putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
can perform its update outside of a hash table lock (e.g. a computation). It synchronizes
on the entry to update, checking first if it was still alive. This resulted in a race where
the entry was removed from the hash table, the value updated, and entry marked as dead. When
the listener was notified, it received the wrong value.

The solution I have now is to expand the synchronized block on eviction. This passes your
test and should be cheap. I'd like to review it a little more and incorporate your test into
my suite.

This is an excellent find. I've stared at the code many times and the race seems obvious in
hindsight.

> Caffeine cache causes BlockCache corruption 
> --------------------------------------------
>
>                 Key: SOLR-10141
>                 URL: https://issues.apache.org/jira/browse/SOLR-10141
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Yonik Seeley
>         Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the concurrency
test passes with the previous implementation using ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message