lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "McKinley, James T" <james.mckin...@cengage.com>
Subject RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
Date Wed, 11 Feb 2015 16:41:39 GMT
Hi,

A couple mailing list members have brought the following paragraph from the https://wiki.apache.org/lucene-java/JavaBugs
page to my attention:

"Do not, under any circumstances, run Lucene with the G1 garbage collector. Lucene's test
suite fails with the G1 garbage collector on a regular basis, including bugs that cause index
corruption. There is no person on this planet that seems to understand such bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348,
open for over a year), so don't count on the situation changing soon. This information is
not out of date, and don't think that the next oracle java release will fix the situation."

Since we run Lucene 4.8.1 on Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM)
64-Bit Server VM (build 23.0-b21, mixed mode) using G1GC in production I felt I should look
into the issue and see if it is reproducible in our environment.  First I read the bug linked
in the above paragraph as well as https://issues.apache.org/jira/browse/LUCENE-5168 and it
appears quite a bit of work in trying to track down this bug has already been done by Dawid
Weiss and Vladmir Kozlov but it seems it is limited to the 32-bit JVM (maybe even only on
Windows), to quote Dawid Weiss from the Jira bug:

"My quest continues 

I thought it'd be interesting to see how far back I can trace this
issue. I fetched the official binaries for jdk17 (windows, 32-bit) and
did a binary search with the failing Lucene test command. The results
show that, in short:

...
jdk1.7.0_03: PASSES
jdk1.7.0_04: FAILS
...

and are consistent before and after. jdk1.7.0_04, 64-bit does *NOT*
exhibit the issue (and neither does any version afterwards, it only
happens on 32-bit; perhaps it's because of smaller number of available
registers and the need to spill?).

jdk1.7.0_04 was when G1GC was "officially" made supported but I don't
think this plays a big difference. I'll see if I can bsearch on
mercurial revisions to see which particular revision introduced the
problem. Anyway, the problem has to be a long-standing issue and not a
regression. Which makes it even more interesting I guess.

Dawid"

In addition the second to last comment in the LUCENE-5168 bug is "I don't think this is closely
related to G1GC. It looks more that G1GC happily triggers this bug in this special case."

Just to make sure the bug wasn't reproducible with our specific environment I checked out
the tag for Lucene 4.8.1 (http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_8_1)
and made the following change to common-build.xml:

gada@C006129:~/workspace-java/lucene_solr_4_8_1/lucene$ svn diff common-build.xml 
Index: common-build.xml
===================================================================
--- common-build.xml	(revision 1658458)
+++ common-build.xml	(working copy)
@@ -92,7 +92,7 @@
   </path>
 
   <!-- default arguments to pass to JVM executing tests -->
-  <property name="args" value=""/>
+  <property name="args" value="-XX:+UnlockDiagnosticVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=100
-XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=12 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime
-Xloggc:/home/gada/tmp/lucene-test-gc.log -XX:LogFile=/home/gada/tmp/lucene-test-vmop.log
-XX:+LogVMOutput -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"/>
 
   <property name="tests.seed" value="" />
 
I then ran the following script:

#!/bin/bash
count=0
while ant test ; do
	count=$[$count +1]
	printf "\n\n\nrun $count completed without errors\n\n\n"
	if [ "$count" -ge 100 ]; then
		break
	fi
	sleep 1
done

All tests ran successfully 100 times in a row on a dual 6-core CPU Intel Xeon Lenovo C30 ThinkStation
with 64GB RAM running the Ubuntu 14.04 LTS Linux distribution.  I also successfully ran the
test suite a few times on Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM)
64-Bit Server VM (build 24.55-b03, mixed mode) since I had it available.

TL;DR:

I think perhaps the sentence: "Do not, under any circumstances, run Lucene with the G1 garbage
collector." is a bit too strong.  Maybe a more balanced statement is in order?  For example,
"we've found that the OpenJDK/Oracle 32-bit JVM (if only on Windows, say only on Windows)
has a bug that when used in combination with the the G1 garbage collector causes incorrect
code to be produced possibly resulting in index corruption", or something along those lines.
 It seems a shame to possibly scare new Lucene users away from using G1GC with the 64-bit
JVM given that it has better performance on large heaps which are becoming more common today.

FWIW,
Jim
________________________________________
From: McKinley, James T [james.mckinley@cengage.com]
Sent: Monday, February 09, 2015 11:00 AM
To: java-user@lucene.apache.org
Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

OK thanks Erick, I have put a story in our jira backlog to investigate the G1GC issues with
the Lucene test suite.  I don't know if we'll be able to shed any light on the issue, but
since we're using Lucene with Java 7 G1GC, I guess we better investigate it.

Jim
________________________________________
From: Erick Erickson [erickerickson@gmail.com]
Sent: Saturday, February 07, 2015 2:22 PM
To: java-user
Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

The G1C1 issue reference by Robert Muir on the Wiki page is at a
Lucene level. Lucene, of course, is critically important to Solr so
from that perspective it is about Solr too.

https://wiki.apache.org/lucene-java/JavaBugs

And, I assume, it also applies to your custom app.

FWIW,
Erick

On Fri, Feb 6, 2015 at 12:10 PM, McKinley, James T
<james.mckinley@cengage.com> wrote:
> Just to be clear in case there was any confusion about my previous message regarding
G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine.  Consequently,
I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway),
G1GC seems to work well with Lucene.
>
> Jim
> ________________________________________
> From: Piotr Idzikowski [piotridzikowski@gmail.com]
> Sent: Friday, February 06, 2015 5:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hello.
> A little bit delayed question. But recently I have found this articles:
> https://wiki.apache.org/solr/SolrPerformanceProblems
> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> Especially this part from first url:
> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
> very good option for for Solr, but with the latest Java 7 releases (7u72 at
> the time of this writing), G1 is looking like a better option, if the
> -XX:+ParallelRefProcEnabled option is used.*
>
> How does it play with *"Do not, under any circumstances, run Lucene with
> the G1 garbage collector."*
> from https://wiki.apache.org/lucene-java/JavaBugs?
>
> Regards
> Piotr
>
> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
> james.mckinley@cengage.com> wrote:
>
>> Hi Uwe,
>>
>> OK, thanks for the info.  We'll see if we can download the Lucene test
>> suite and check it out.
>>
>> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
>> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
>> pairs of 30 index partitions with 15M-23M docs each) and have not
>> experienced any VM crashes (well, maybe a couple, but not directly
>> traceable to G1 to my knowledge).  We have found some undocumented pauses
>> in G1 due to very large object arrays and filed a bug report which was
>> confirmed and also affects CMS (we worked around this in our code using
>> memory mapping of some files whose contents we previously held all in
>> RAM).  I think the only index corruption we've ever seen was in our index
>> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
>> Parallel GC since it is a batch system, so that corruption (which we've not
>> seen recently and never found a cause for) was definitely not due to G1GC.
>>
>> G1GC has bugs as does CMS but we've found it to work pretty well so far in
>> our runtime system.  Of course YMMV, thanks again for the info.
>>
>> Jim
>> ________________________________________
>> From: Uwe Schindler [uwe@thetaphi.de]
>> Sent: Tuesday, January 27, 2015 3:02 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>
>> Hi.,
>>
>> About G1GC. We consistently see problems when running the Lucene Testsuite
>> with G1GC enabled. The people from Elasticsearch concluded:
>>
>> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
>> designed to minimize pausing even more than CMS, and operate on large
>> heaps. It works by dividing the heap into regions and predicting which
>> regions contain the most reclaimable space. By collecting those regions
>> first (garbage first), it can minimize pauses and operate on very large
>> heaps.
>>
>> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
>> routinely. These bugs are usually of the segfault variety, and will cause
>> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
>> seems that G1GC hasn’t had the kinks worked out yet.
>>
>> We would like to recommend G1GC someday, but for now, it is simply not
>> stable enough to meet the demands of Elasticsearch and Lucene."
>> (
>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
>> )
>>
>> In fact, the problems with G1GC can sometimes lead to index corruption,
>> and are hard to reproduce. So better don't use...
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
>> > Sent: Tuesday, January 27, 2015 8:58 PM
>> > To: java-user@lucene.apache.org
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
>> > 4.8.1 in production.  Thanks.
>> >
>> > Jim
>> > ________________________________________
>> > From: Uwe Schindler [uwe@thetaphi.de]
>> > Sent: Tuesday, January 27, 2015 2:49 PM
>> > To: java-user@lucene.apache.org; 'kiwi clive'
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Java 8 update 20 or later is also fine. At current time, always use
>> latest update
>> > release and you are be fine with Java 7 and Java 8. Don't use older
>> releases
>> > and don't use G1 Garbage Collector.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> > > -----Original Message-----
>> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
>> > > Sent: Tuesday, January 27, 2015 8:03 PM
>> > > To: java-user@lucene.apache.org
>> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> > >
>> > > Hi Hoss,
>> > > Many thanks for the information. This looks very encouraging as the
>> > > Java7 bug I remember  was fixed and as far as I know, we should not be
>> > > affected by the others.
>> > > I'll put a few tests together and put my toe in the water :-) Clive
>> > >
>> > >       From: Chris Hostetter <hossman_lucene@fucit.org>
>> > >  To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>;
kiwi
>> > > clive <kiwi_clive@yahoo.com>
>> > >  Sent: Tuesday, January 27, 2015 4:01 PM
>> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
>> > > Versions(6->8)
>> > >
>> > >
>> > >
>> > >
>> > > : I seem to remember reading that certain versions of lucene were
>> > > : incompatible with some java versions although I cannot find anything
>> > > to
>> > > : verify this. As we have tens of thousands of large indexes,
>> > > backwards
>> > > : compatibility without the need to reindex on an upgrade is of prime
>> > > : importance to us.
>> > >
>> > > All known JVM bugs affecting Lucene are listed here...
>> > >
>> > > https://wiki.apache.org/lucene-java/JavaBugs
>> > >
>> > >
>> > > -Hoss
>> > > http://www.lucidworks.com/
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message