cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Meredith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion
Date Wed, 17 Oct 2018 15:23:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653734#comment-16653734
] 

Jon Meredith commented on CASSANDRA-14790:
------------------------------------------

[~benedict] I think we're reading it the same way - my argument was that it can cause buffers
to be allocated from the heap when MEMORY_USAGE_THRESHOLD has not been exceeded yet. I'd describe
it as a benign race rather than beneficial. The calling thread has to pay the price of allocating
chunks that other threads stole and then an extra allocation which could possibly result in
a blocking system call to get more memory. Instead allocateMoreChunks could return one of
the chunks to it's caller and add one less chunks to the queue.

I'm not even sure it's worth changing anything, but [~djoshi3] wanted to see what you thought
about it.

--8<--
 Here's the example I wrote up before I read your comment more carefully.

Start with no allocations from any of the thread local or buffer pools yet.

CHUNK_SIZE=64 KiB
 MACRO_CHUNK_SIZE = 1024 KiB
 MEMORY_USAGE_THRESHOLD = 16384 KiB (for the unit test)

1) T1 calls BufferPool.get(1) and ends up in GlobalPool:get. chunks.poll returns null so it
calls allocateMoreChunks which allocates a macro chunk, divides it up into 16 (1024KiB / 64KiB)
Chunks that are added to BufferPool.GlobalPool.chunks.

2) Between the adding the last chunk and the 'one last attempt' to pull it in Chunk.get, 16
other calls to GlobalPool::get take place on other threads, emptying GlobalPool.chunks

3) T1 returns from allocateMoreChunks, back in Chunk::get chunks.poll() returns null and which
gets passed up the call chain with the null causing a call to BufferPool.allocate which allocates
memory outside of the pool, despite the current pool memory usage being at ~1MiB, which is
less than the usage threshold and should have been satisfied by the pool.

As I said, I don't think it's really a big deal as memory allocated outside the pool should
be freed/garbage collected just fine and the buffer pool is just an optimization.

It's also possible for T1, T2 to both arrive in allocateMoreBuffers with BufferPool.GlobalPool.chunk
 empty and cause harmless allocation of extra buffers, but it looks like it uses atomics
 to make sure the MEMORY_USAGE_THRESHOLD invariant isn't exceeded.

> LongBufferPoolTest burn test fails assertion
> --------------------------------------------
>
>                 Key: CASSANDRA-14790
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14790
>             Project: Cassandra
>          Issue Type: Test
>          Components: Testing
>         Environment: Run under macOS 10.13.6, with patch (attached, but also https://github.com/jonmeredith/cassandra/tree/failing-burn-test)
>            Reporter: Jon Meredith
>            Assignee: Jon Meredith
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, 0002-Initialize-before-running-LongBufferPoolTest.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> The LongBufferPoolTest from the burn tests fails with an assertion error.  I added a
build target to run individual burn tests, and \{jasobrown} gave a fix for the uninitialized
test setup (attached), however the test now fails on an assertion about recycling buffers.
> To reproduce (with patch applied)
> {{ant burn-testsome -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest
-Dtest.methods=testAllocate}}
> Output
> {{    [junit] Testcase: testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest):
FAILED}}
> {{    [junit] null}}
> {{    [junit] junit.framework.AssertionFailedError}}
> {{    [junit] at org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}}
> {{    [junit] at org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}}
> {{    [junit] at org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}}
> {{    [junit] at org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}}
> All major branches from 3.0 and later have issues, however the trunk branch also warns about
references not being released before the reference is garbage collected.
> {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - LEAK DETECTED:
a reference (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was not
released before the reference was garbage collected}}
> {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - Allocate
trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}}
> {{ [junit] Thread[pool-2-thread-24,5,main]}}
> {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}}
> {{ [junit] at org.apache.cassandra.utils.concurrent.Ref$Debug.<init>(Ref.java:245)}}
> {{ [junit] at org.apache.cassandra.utils.concurrent.Ref$State.<init>(Ref.java:175)}}
> {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.<init>(Ref.java:97)}}
> {{ [junit] at org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}}
> {{ [junit] at org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:803)}}
> {{ [junit] at org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:793)}}
> {{ [junit] at org.apache.cassandra.utils.memory.BufferPool$LocalPool.get(BufferPool.java:388)}}
> {{ [junit] at org.apache.cassandra.utils.memory.BufferPool.maybeTakeFromPool(BufferPool.java:143)}}
> {{ [junit] at org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:115)}}
> {{ [junit] at org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:85)}}
> {{ [junit] at org.apache.cassandra.utils.memory.LongBufferPoolTest$3.allocate(LongBufferPoolTest.java:296)}}
> {{ [junit] at org.apache.cassandra.utils.memory.LongBufferPoolTest$3.testOne(LongBufferPoolTest.java:246)}}
> {{ [junit] at org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:399)}}
> {{ [junit] at org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:379)}}
> {{ [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{ [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
> {{ [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{ [junit] at java.lang.Thread.run(Thread.java:748)}}
>  
> Perhaps the environment is not being set up correctly for the tests.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message