lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Solr and Garbage Collection
Date Sat, 03 Oct 2009 20:03:14 GMT
Yup - I know - I remember the Slashdot discussion on it well - I didn't
mean it that way myself. It caused quite a stir, but most people figured
out what they meant before they released any further info from what I
could tell. I just made the same mistake they did :)

Bill Au wrote:
> SUN's initial release notes actually pretty much said that it was
> "unsupported unless you pay".  They had since revised the release notes to
> clear up the confusion.
> Bill
>
> On Sat, Oct 3, 2009 at 2:51 PM, Mark Miller <markrmiller@gmail.com> wrote:
>
>   
>> Ah, yes - thanks for the clarification. Didn't pay attention to how
>> ambiguously I was using "supported" there :)
>>
>> Bill Au wrote:
>>     
>>> SUN has recently clarify the issue regarding "unsupported unless you pay"
>>> for the G1 garbage collector. Here is the updated release of Java 6
>>>       
>> update
>>     
>>> 14:
>>> http://java.sun.com/javase/6/webnotes/6u14.html
>>>
>>>
>>> G1 will be part of Java 7, fully supported without pay.  The version
>>> included in Java 6 update 14 is a beta release.  Since it is beta, SUN
>>>       
>> does
>>     
>>> not recommend using it unless you have a support contract because as with
>>> any beta software there will be bugs.  Non paying customers may very well
>>> have to wait for the official version in Java 7 for bug fixes.
>>>
>>> Here is more info on the G1 garbage collector:
>>>
>>> http://java.sun.com/javase/technologies/hotspot/gc/g1_intro.jsp
>>>
>>>
>>> Bill
>>>
>>> On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller <markrmiller@gmail.com>
>>>       
>> wrote:
>>     
>>>       
>>>> Another option of course, if you're using a recent version of Java 6:
>>>>
>>>> try out the beta-ish, unsupported unless you pay, G1 garbage collector.
>>>> I've only recently started playing with it, but its supposed to be much
>>>> better than CMS. Its supposedly got much better throughput, its much
>>>> better at dealing with fragmentation issues (CMS is actually pretty bad
>>>> with fragmentation come to find out), and overall its just supposed to
>>>> be a very nice leap ahead in GC. Havn't had a chance to play with it
>>>> much myself, but its supposed to be fantastic. A whole new approach to
>>>> generational collection for Sun, and much closer to the "real time" GC's
>>>> available from some other vendors.
>>>>
>>>> Mark Miller wrote:
>>>>
>>>>         
>>>>> siping liu wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> I read pretty much all posts on this thread (before and after this
>>>>>>             
>> one).
>>     
>>>> Looks like the main suggestion from you and others is to keep max heap
>>>>         
>> size
>>     
>>>> (-Xmx) as small as possible (as long as you don't see OOM exception).
>>>>         
>> This
>>     
>>>> brings more questions than answers (for me at least. I'm new to Solr).
>>>>
>>>>         
>>>>>> First, our environment and problem encountered: Solr1.4 (nightly
>>>>>>             
>> build,
>>     
>>>> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
>>>> Solaris(multi-cpu/cores). The cache setting is from the default
>>>> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS
>>>>         
>> and
>>     
>>>> quickly run into the problem similar to the one orignal poster reported
>>>>         
>> --
>>     
>>>> long pause (seconds to minutes) under load test. jconsole showed that it
>>>> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC
>>>> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
>>>> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the
>>>>         
>> thinking
>>     
>>>> is with mutile-cpu/cores we can get over with GC as quickly as possibe.
>>>>         
>> With
>>     
>>>> the new setup, it works fine until Tomcat reaches heap size, then it
>>>>         
>> blocks
>>     
>>>> and takes minutes on "full GC" to get more space from "tenure
>>>>         
>> generation".
>>     
>>>> We tried different Xmx (from very small to large), no difference in long
>>>>         
>> GC
>>     
>>>> time. We never run into OOM.
>>>>
>>>>         
>>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
>>>>> the Parallel collector. That also doesnt look like a good
>>>>>           
>> survivorratio.
>>     
>>>>>           
>>>>>> Questions:
>>>>>>
>>>>>> * In general various cachings are good for performance, we have more
>>>>>>             
>> RAM
>>     
>>>> to use and want to use more caching to boost performance, isn't your
>>>> suggestion (of lowering heap limit) going against that?
>>>>
>>>>         
>>>>> Leaving RAM for the FileSystem cache is also very important. But you
>>>>> should also have enough RAM for your Solr caches of course.
>>>>>
>>>>>
>>>>>           
>>>>>> * Looks like Solr caching made its way into tenure-generation on
heap,
>>>>>>
>>>>>>             
>>>> that's good. But why they get GC'ed eventually?? I did a quick check of
>>>>         
>> Solr
>>     
>>>> code (Solr 1.3, not 1.4), and see a single instance of using
>>>>         
>> WeakReference.
>>     
>>>> Is that what is causing all this? This seems to suggest a design flaw in
>>>> Solr's memory management strategy (or just my ignorance about Solr?). I
>>>> mean, wouldn't this be the "right" way of doing it -- you allow user to
>>>> specify the cache size in solrconfig.xml, then user can set up heap
>>>>         
>> limit in
>>     
>>>> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
>>>> SoftReference)??
>>>>
>>>>         
>>>>> Do you see concurrent mode failure when looking at your gc logs? ie:
>>>>>
>>>>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.0000618
>>>>> secs]174.446: [CMS (concurrent mode failure):
>>>>>           
>> 161928K->162118K(175104K),
>>     
>>>>> 4.0975124 secs] 228336K->162118K(241520K)
>>>>>
>>>>> That means you have still getting major collections with CMS, and you
>>>>> don't want that. You might try kicking GC off earlier with something
>>>>> like: -XX:CMSInitiatingOccupancyFraction=50
>>>>>
>>>>>
>>>>>           
>>>>>> * Right now I have a single Tomcat hosting Solr and other
>>>>>>             
>> applications.
>>     
>>>> I guess now it's better to have Solr on its own Tomcat, given that it's
>>>> tricky to adjust the java options.
>>>>
>>>>         
>>>>>> thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> From: wunder@wunderwood.org
>>>>>>> To: solr-user@lucene.apache.org
>>>>>>> Subject: RE: Solr and Garbage Collection
>>>>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700
>>>>>>>
>>>>>>> 30ms is not better or worse than 1s until you look at the service
>>>>>>> requirements. For many applications, it is worth dedicating 10%
of
>>>>>>>               
>> your
>>     
>>>>>>> processing time to GC if that makes the worst-case pause short.
>>>>>>>
>>>>>>> On the other hand, my experience with the IBM JVM was that the
>>>>>>>               
>> maximum
>>     
>>>> query
>>>>
>>>>         
>>>>>>> rate was 2-3X better with the concurrent generational GC compared
to
>>>>>>>
>>>>>>>               
>>>> any of
>>>>
>>>>         
>>>>>>> their other GC algorithms, so we got the best throughput along
with
>>>>>>>               
>> the
>>     
>>>>>>> shortest pauses.
>>>>>>>
>>>>>>> Solr garbage generation (for queries) seems to have two major
>>>>>>>
>>>>>>>               
>>>> components:
>>>>
>>>>         
>>>>>>> per-request garbage and cache evictions. With a generational
>>>>>>>               
>> collector,
>>     
>>>>>>> these two are handled by separate parts of the collector. Per-request
>>>>>>> garbage should completely fit in the short-term heap (nursery),
so
>>>>>>>               
>> that
>>     
>>>> it
>>>>
>>>>         
>>>>>>> can be collected rapidly and returned to use for further requests.
If
>>>>>>>
>>>>>>>               
>>>> the
>>>>
>>>>         
>>>>>>> nursery is too small, the per-request allocations will be made
in
>>>>>>>
>>>>>>>               
>>>> tenured
>>>>
>>>>         
>>>>>>> space and sit there until the next major GC. Cache evictions
are
>>>>>>>               
>> almost
>>     
>>>>>>> always in long-term storage (tenured space) because an LRU algorithm
>>>>>>> guarantees that the garbage will be old.
>>>>>>>
>>>>>>> Check the growth rate of tenured space (under constant load,
of
>>>>>>>               
>> course)
>>     
>>>>>>> while increasing the size of the nursery. That rate should drop
when
>>>>>>>
>>>>>>>               
>>>> the
>>>>
>>>>         
>>>>>>> nursery gets big enough, then not drop much further as it is
>>>>>>>               
>> increased
>>     
>>>> more.
>>>>
>>>>         
>>>>>>> After that, reduce the size of tenured space until major GCs
start
>>>>>>>
>>>>>>>               
>>>> happening
>>>>
>>>>         
>>>>>>> "too often" (a judgment call). A bigger tenured space means longer
>>>>>>>
>>>>>>>               
>>>> major GCs
>>>>
>>>>         
>>>>>>> and thus longer pauses, so you don't want it oversized by too
much.
>>>>>>>
>>>>>>> Also check the hit rates of your caches. If the hit rate is low,
say
>>>>>>>
>>>>>>>               
>>>> 20% or
>>>>
>>>>         
>>>>>>> less, make that cache much bigger or set it to zero. Either one
will
>>>>>>>
>>>>>>>               
>>>> reduce
>>>>
>>>>         
>>>>>>> the number of cache evictions. If you have an HTTP cache in front
of
>>>>>>>
>>>>>>>               
>>>> Solr,
>>>>
>>>>         
>>>>>>> zero may be the right choice, since the HTTP cache is cherry-picking
>>>>>>>
>>>>>>>               
>>>> the
>>>>
>>>>         
>>>>>>> easily cacheable requests.
>>>>>>>
>>>>>>> Note that a commit nearly doubles the memory required, because
you
>>>>>>>               
>> have
>>     
>>>> two
>>>>
>>>>         
>>>>>>> live Searcher objects with all their caches. Make sure you have
>>>>>>>
>>>>>>>               
>>>> headroom for
>>>>
>>>>         
>>>>>>> a commit.
>>>>>>>
>>>>>>> If you want to test the tenured space usage, you must test with
real
>>>>>>>
>>>>>>>               
>>>> world
>>>>
>>>>         
>>>>>>> queries. Those are the only way to get accurate cache eviction
rates.
>>>>>>>
>>>>>>> wunder
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> _________________________________________________________________
>>>>>> Bing™  brings you maps, menus, and reviews organized in one place.
>>>>>>             
>> Try
>>     
>>>> it now.
>>>>
>>>>
>>>>         
>> http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1
>>     
>>>>>           
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         
>>>       
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>>     
>
>   


-- 
- Mark

http://www.lucidimagination.com




Mime
View raw message