lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-8780) Improve ByteBufferGuard in Java 11
Date Mon, 29 Apr 2019 16:32:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829428#comment-16829428
] 

Uwe Schindler edited comment on LUCENE-8780 at 4/29/19 4:31 PM:
----------------------------------------------------------------

Hi,
I did some tests over the last night. The fluctuating results can be improved by removing
the null-check on the "cleaner". Normally this one should be optimized away, but that seems
to be too hard. Without it, you still have some variance in the results, but it's better.

In general the new code is a bit slower, which is more visible if you have shorter runtime,
so it looks like optimizing aways the VarHandle simply takes longer, and this degrades oaverage
performance. I don't know how long the benchmark runs a warmup.

bq. Every time I read about those memory ordering models I seem to get it and two weeks later
I'm again confused...

That's the ssme for me. But our use-case seems to be pretty good explained in the above quotes
by Doug Lea. It clearly says that a volatile write is harder than a opaque read, the volatile
write is is only visible eventually to the opaque read (but never for a plain read, if it
was optimized away).

I was already talking with Hotspot guys: In short, there is no way to make ByteBufferGuard
"safe" unless you use a volatile or with an aquire/release fence (that can be implemented
also with the varhandles). When using it, I was unable to crush my VM, but slowdown is definitely
there. Also manually putting in fences does not help.

The main difference between plain read and opaque (as used here) is not that the produced
CPU instructions are different, the difference is only how the optimizer may optimize code.
With plain reads, a simple spin-loop will never end, as the hotspot compiler clearly sees
that the value can't change, and because it's not volatile, opaque or whatever, he will remove
it for this thread. This does not only happen in spin loops, it also happens in our code if
it's executed often enough. This is not a bug, it's wanted.

I tuned the testcase to crush always: Just insert a loop of 10.000 reads BEFORE you unmap,
then it fails always (in Java 8). With opaque it was also doing this, but not reproducible,
so there is an improvement. IMHO, the reason why we see a slowdown for some queries could
be coming from that: Hotspot is removing the "if (invalidated)" check, as it cannot change
in the current thread. With opaque it can't do it, so the code is slower on the long term.
I will try to get some assembly output later, just have to install hsdis.dll.

To make it bulletproof, we have to wait for official support by the ByteBuffer API to officially
unmap (https://bugs.openjdk.java.net/browse/JDK-4724038), with our checks we cannot make it
safe. Another approach would be to expand the SIGSEGV signal handler checks in the JVM to
throw an InternalError (they do it now if the operating system truncates the file and so the
mapped are changes). I don't know why they not generally do a SIGSEGV check and if it happens
inside DirectBuffer they just throw an Exception.... (possibly delayed as in the truncated
file case).

So we have to decide what we want to do as a workaround:
- We can't be bulletproof
- We can catch with our current code most cases where you open an index, start a query and
then close the reader. This only works shortly after program started and everything is not
yet optimized. As soon as there were multiple queries already running and you close the JVM
later, it SIGSEGV almost always. Not even "opaque" semantics help here.
- With the current code we can go a bit further, but we have some slowdown, but it's still
not save.


was (Author: thetaphi):
Hi,
I did some tests over the last night. The fluctuating results can be improved by removing
the null-check on the "cleaner". Normally this one should be optimized away, but that seems
to be too hard. Without it, you still have some variance in the results, but it's better.

In general the new code is a bit slower, which is more visible if you have shorter runtime,
so it looks like optimizing aways the VarHandle simply takes longer, and this degrades oaverage
performance. I don't know how long the benchmark runs a warmup.

bq. Every time I read about those memory ordering models I seem to get it and two weeks later
I'm again confused...

That's the ssme for me. But our use-case seems to be pretty good explained in the above quotes
by Doug Lea. It clearly says that a volatile write is harder than a opaque read, but the opaque
read is only visible eventually.

I was already talking with Hotspot guys: In short, there is no way to make ByteBufferGuard
"safe" unless you use a volatile or with an aquire/release fence (that can be implemented
also with the varhandles). When using it, I was unable to crush my VM, but slowdown is definitely
there. Also manually putting in fences does not help.

The main difference between plain read and opaque (as used here) is not that the produced
CPU instructions are different, the difference is only how the optimizer may optimize code.
With plain reads, a simple spin-loop will never end, as the hotspot compiler clearly sees
that the value can't change, and because it's not volatile, opaque or whatever, he will remove
it for this thread. This does not only happen in spin loops, it also happens in our code if
it's executed often enough. This is not a bug, it's wanted.

I tuned the testcase to crush always: Just insert a loop of 10.000 reads BEFORE you unmap,
then it fails always (in Java 8). With opaque it was also doing this, but not reproducible,
so there is an improvement. IMHO, the reason why we see a slowdown for some queries could
be coming from that: Hotspot is removing the "if (invalidated)" check, as it cannot change
in the current thread. With opaque it can't do it, so the code is slower on the long term.
I will try to get some assembly output later, just have to install hsdis.dll.

To make it bulletproof, we have to wait for official support by the ByteBuffer API to officially
unmap (https://bugs.openjdk.java.net/browse/JDK-4724038), with our checks we cannot make it
safe. Another approach would be to expand the SIGSEGV signal handler checks in the JVM to
throw an InternalError (they do it now if the operating system truncates the file and so the
mapped are changes). I don't know why they not generally do a SIGSEGV check and if it happens
inside DirectBuffer they just throw an Exception.... (possibly delayed as in the truncated
file case).

So we have to decide what we want to do as a workaround:
- We can't be bulletproof
- We can catch with our current code most cases where you open an index, start a query and
then close the reader. This only works shortly after program started and everything is not
yet optimized. As soon as there were multiple queries already running and you close the JVM
later, it SIGSEGV almost always. Not even "opaque" semantics help here.
- With the current code we can go a bit further, but we have some slowdown, but it's still
not save.

> Improve ByteBufferGuard in Java 11
> ----------------------------------
>
>                 Key: LUCENE-8780
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8780
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>    Affects Versions: master (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>              Labels: Java11
>         Attachments: LUCENE-8780.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In LUCENE-7409 we added {{ByteBufferGuard}} to protect MMapDirectory from crushing the
JVM with SIGSEGV when you close and unmap the mmapped buffers of an IndexInput, while another
thread is accessing it.
> The idea was to do a volatile write access to flush the caches (to trigger a full fence)
and set a non-volatile boolean to true. All accesses would check the boolean and stop the
caller from accessing the underlying ByteBuffer. This worked most of the time, until the JVM
optimized away the plain read access to the boolean (you can easily see this after some runtime
of our by-default ignored testcase).
> With master on Java 11, we can improve the whole thing. Using VarHandles you can use
any access type when reading or writing the boolean. After reading Doug Lea's expanation <http://gee.cs.oswego.edu/dl/html/j9mm.html>
and some testing, I was no longer able to crush my JDK (even after running for minutes unmapping
bytebuffers).
> The apraoch is the same, we do a full-fenced write (standard volatile write) when we
unmap, then we yield the thread (to finish in-flight reads in other threads) and then unmap
all byte buffers.
> On the test side (read access), instead of using a plain read, we use the new "opaque
read". Opaque reads are the same as plain reads, there are only different order requirements.
Actually the main difference is explained by Doug like this: "For example in constructions
in which the only modification of some variable x is for one thread to write in Opaque (or
stronger) mode, X.setOpaque(this, 1), any other thread spinning in while(X.getOpaque(this)!=1){}
will eventually terminate. Note that this guarantee does NOT hold in Plain mode, in which
spin loops may (and usually do) infinitely loop -- they are not required to notice that a
write ever occurred in another thread if it was not seen on first encounter." - And that's
waht we want to have: We don't want to do volatile reads, but we want to prevent the compiler
from optimizing away our read to the boolean. So we want it to "eventually" see the change.
By the much stronger volatile write, the cache effects should be visible even faster (like
in our Java 8 approach, just now we improved our read side).
> The new code is much slimmer (theoretically we could also use a AtomicBoolean for that
and use the new method {{getOpaque()}}, but I wanted to prevent extra method calls, so I used
a VarHandle directly).
> It's setup like this:
> - The underlying boolean field is a private member (with unused SuppressWarnings, as
its unused by the java compiler), marked as volatile (that's the recommendation, but in reality
it does not matter at all).
> - We create a VarHandle to access this boolean, we never do this directly (this is why
the volatile marking does not affect us).
> - We use VarHandle.setVolatile() to change our "invalidated" boolean to "true", so enforcing
a full fence
> - On the read side we use VarHandle.getOpaque() instead of VarHandle.get() (like in our
old code for Java 8).
> I had to tune our test a bit, as the VarHandles make it take longer until it actually
crushes (as optimizations jump in later). I also used a random for the reads to prevent the
optimizer from removing all the bytebuffer reads. When we commit this, we can disable the
test again (it takes approx 50 secs on my machine).
> I'd still like to see the differences between the plain read and the opaque read in production,
so maybe [~mikemccand] or [~rcmuir] can do a comparison with nightly benchmarker?
> Have fun, maybe [~dweiss] has some ideas, too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message