cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Manes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-405) Race condition with ConcurrentLinkedHashMap
Date Wed, 02 Sep 2009 03:41:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750257#action_12750257
] 

Ben Manes commented on CASSANDRA-405:
-------------------------------------

When performing a postmortem on this issue, please review how the ConcurrentLinkedHashMap
was added.  The project page stated:

> Note: The algorithm needs further testing and is not deemed production ready. It is functional
under concurrent tests, but needs additional load testing to assert correctness.

That load testing, provided in the standard unit test runs, uncovered the issue and thus it
was not promoted to a release status.  I haven't had time in the last few months to work on
this project, but even the last check-in notes that its leaving debug code to help resolve
it later.  The project states on the front page and FAQ that the goal is more educational
than formal usage, hence I avoided known algorithms (which would be the correct approach if
it was work-related).

The ConcurrentLRUCache uses a watermark approach which is valid, but suffers from stampeding
and is an offline algorithm.  Its still an excellent approach and one of many possibilities
described in the FAQ.  I am personally a fan of soft-reference based caching for global data,
which is evicted in LRU order, because it allows the GC to manage what it does best (memory!)
and promotes not overburdening the application server.

Please treat this as an issue where the blame is both 3p as I did not stress heavily enough
not to use this in production and internal for not evaluating a 3p project enough to recognize
that it warned about its production status.  I will update the project page to better communicate
and provide a performant modification that is thread-safe for those that need a solution.
 Please re-evaluate your own internal processes to determine why the bad call was made.

I am not trying to shift blame, but my pet peeve is when firefighting production and no one
learns because then it just happens again.  Its very frustrating, even more so if I actually
work there! ;-)

Cheers!
Ben

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using appendToTail. We could
remove the ConcurrentLinkedHashMap for now until that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message