lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Solr and Garbage Collection
Date Sat, 03 Oct 2009 20:10:34 GMT
Actually, now as I am remembering, I think the main give away away, as
someone mentioned back when Slashdot had that misleading post, was that
it was slated for OpenJDK - which is open source :) Typical Slashdot though.

Mark Miller wrote:
> Yup - I know - I remember the Slashdot discussion on it well - I didn't
> mean it that way myself. It caused quite a stir, but most people figured
> out what they meant before they released any further info from what I
> could tell. I just made the same mistake they did :)
>
> Bill Au wrote:
>   
>> SUN's initial release notes actually pretty much said that it was
>> "unsupported unless you pay".  They had since revised the release notes to
>> clear up the confusion.
>> Bill
>>
>> On Sat, Oct 3, 2009 at 2:51 PM, Mark Miller <markrmiller@gmail.com> wrote:
>>
>>   
>>     
>>> Ah, yes - thanks for the clarification. Didn't pay attention to how
>>> ambiguously I was using "supported" there :)
>>>
>>> Bill Au wrote:
>>>     
>>>       
>>>> SUN has recently clarify the issue regarding "unsupported unless you pay"
>>>> for the G1 garbage collector. Here is the updated release of Java 6
>>>>       
>>>>         
>>> update
>>>     
>>>       
>>>> 14:
>>>> http://java.sun.com/javase/6/webnotes/6u14.html
>>>>
>>>>
>>>> G1 will be part of Java 7, fully supported without pay.  The version
>>>> included in Java 6 update 14 is a beta release.  Since it is beta, SUN
>>>>       
>>>>         
>>> does
>>>     
>>>       
>>>> not recommend using it unless you have a support contract because as with
>>>> any beta software there will be bugs.  Non paying customers may very well
>>>> have to wait for the official version in Java 7 for bug fixes.
>>>>
>>>> Here is more info on the G1 garbage collector:
>>>>
>>>> http://java.sun.com/javase/technologies/hotspot/gc/g1_intro.jsp
>>>>
>>>>
>>>> Bill
>>>>
>>>> On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller <markrmiller@gmail.com>
>>>>       
>>>>         
>>> wrote:
>>>     
>>>       
>>>>       
>>>>         
>>>>> Another option of course, if you're using a recent version of Java 6:
>>>>>
>>>>> try out the beta-ish, unsupported unless you pay, G1 garbage collector.
>>>>> I've only recently started playing with it, but its supposed to be much
>>>>> better than CMS. Its supposedly got much better throughput, its much
>>>>> better at dealing with fragmentation issues (CMS is actually pretty bad
>>>>> with fragmentation come to find out), and overall its just supposed to
>>>>> be a very nice leap ahead in GC. Havn't had a chance to play with it
>>>>> much myself, but its supposed to be fantastic. A whole new approach to
>>>>> generational collection for Sun, and much closer to the "real time" GC's
>>>>> available from some other vendors.
>>>>>
>>>>> Mark Miller wrote:
>>>>>
>>>>>         
>>>>>           
>>>>>> siping liu wrote:
>>>>>>
>>>>>>
>>>>>>           
>>>>>>             
>>>>>>> Hi,
>>>>>>>
>>>>>>> I read pretty much all posts on this thread (before and after
this
>>>>>>>             
>>>>>>>               
>>> one).
>>>     
>>>       
>>>>> Looks like the main suggestion from you and others is to keep max heap
>>>>>         
>>>>>           
>>> size
>>>     
>>>       
>>>>> (-Xmx) as small as possible (as long as you don't see OOM exception).
>>>>>         
>>>>>           
>>> This
>>>     
>>>       
>>>>> brings more questions than answers (for me at least. I'm new to Solr).
>>>>>
>>>>>         
>>>>>           
>>>>>>> First, our environment and problem encountered: Solr1.4 (nightly
>>>>>>>             
>>>>>>>               
>>> build,
>>>     
>>>       
>>>>> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
>>>>> Solaris(multi-cpu/cores). The cache setting is from the default
>>>>> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS
>>>>>         
>>>>>           
>>> and
>>>     
>>>       
>>>>> quickly run into the problem similar to the one orignal poster reported
>>>>>         
>>>>>           
>>> --
>>>     
>>>       
>>>>> long pause (seconds to minutes) under load test. jconsole showed that
it
>>>>> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC
>>>>> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
>>>>> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the
>>>>>         
>>>>>           
>>> thinking
>>>     
>>>       
>>>>> is with mutile-cpu/cores we can get over with GC as quickly as possibe.
>>>>>         
>>>>>           
>>> With
>>>     
>>>       
>>>>> the new setup, it works fine until Tomcat reaches heap size, then it
>>>>>         
>>>>>           
>>> blocks
>>>     
>>>       
>>>>> and takes minutes on "full GC" to get more space from "tenure
>>>>>         
>>>>>           
>>> generation".
>>>     
>>>       
>>>>> We tried different Xmx (from very small to large), no difference in long
>>>>>         
>>>>>           
>>> GC
>>>     
>>>       
>>>>> time. We never run into OOM.
>>>>>
>>>>>         
>>>>>           
>>>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use
with
>>>>>> the Parallel collector. That also doesnt look like a good
>>>>>>           
>>>>>>             
>>> survivorratio.
>>>     
>>>       
>>>>>>           
>>>>>>             
>>>>>>> Questions:
>>>>>>>
>>>>>>> * In general various cachings are good for performance, we have
more
>>>>>>>             
>>>>>>>               
>>> RAM
>>>     
>>>       
>>>>> to use and want to use more caching to boost performance, isn't your
>>>>> suggestion (of lowering heap limit) going against that?
>>>>>
>>>>>         
>>>>>           
>>>>>> Leaving RAM for the FileSystem cache is also very important. But
you
>>>>>> should also have enough RAM for your Solr caches of course.
>>>>>>
>>>>>>
>>>>>>           
>>>>>>             
>>>>>>> * Looks like Solr caching made its way into tenure-generation
on heap,
>>>>>>>
>>>>>>>             
>>>>>>>               
>>>>> that's good. But why they get GC'ed eventually?? I did a quick check
of
>>>>>         
>>>>>           
>>> Solr
>>>     
>>>       
>>>>> code (Solr 1.3, not 1.4), and see a single instance of using
>>>>>         
>>>>>           
>>> WeakReference.
>>>     
>>>       
>>>>> Is that what is causing all this? This seems to suggest a design flaw
in
>>>>> Solr's memory management strategy (or just my ignorance about Solr?).
I
>>>>> mean, wouldn't this be the "right" way of doing it -- you allow user
to
>>>>> specify the cache size in solrconfig.xml, then user can set up heap
>>>>>         
>>>>>           
>>> limit in
>>>     
>>>       
>>>>> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
>>>>> SoftReference)??
>>>>>
>>>>>         
>>>>>           
>>>>>> Do you see concurrent mode failure when looking at your gc logs?
ie:
>>>>>>
>>>>>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.0000618
>>>>>> secs]174.446: [CMS (concurrent mode failure):
>>>>>>           
>>>>>>             
>>> 161928K->162118K(175104K),
>>>     
>>>       
>>>>>> 4.0975124 secs] 228336K->162118K(241520K)
>>>>>>
>>>>>> That means you have still getting major collections with CMS, and
you
>>>>>> don't want that. You might try kicking GC off earlier with something
>>>>>> like: -XX:CMSInitiatingOccupancyFraction=50
>>>>>>
>>>>>>
>>>>>>           
>>>>>>             
>>>>>>> * Right now I have a single Tomcat hosting Solr and other
>>>>>>>             
>>>>>>>               
>>> applications.
>>>     
>>>       
>>>>> I guess now it's better to have Solr on its own Tomcat, given that it's
>>>>> tricky to adjust the java options.
>>>>>
>>>>>         
>>>>>           
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>>>               
>>>>>>>> From: wunder@wunderwood.org
>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>> Subject: RE: Solr and Garbage Collection
>>>>>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700
>>>>>>>>
>>>>>>>> 30ms is not better or worse than 1s until you look at the
service
>>>>>>>> requirements. For many applications, it is worth dedicating
10% of
>>>>>>>>               
>>>>>>>>                 
>>> your
>>>     
>>>       
>>>>>>>> processing time to GC if that makes the worst-case pause
short.
>>>>>>>>
>>>>>>>> On the other hand, my experience with the IBM JVM was that
the
>>>>>>>>               
>>>>>>>>                 
>>> maximum
>>>     
>>>       
>>>>> query
>>>>>
>>>>>         
>>>>>           
>>>>>>>> rate was 2-3X better with the concurrent generational GC
compared to
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> any of
>>>>>
>>>>>         
>>>>>           
>>>>>>>> their other GC algorithms, so we got the best throughput
along with
>>>>>>>>               
>>>>>>>>                 
>>> the
>>>     
>>>       
>>>>>>>> shortest pauses.
>>>>>>>>
>>>>>>>> Solr garbage generation (for queries) seems to have two major
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> components:
>>>>>
>>>>>         
>>>>>           
>>>>>>>> per-request garbage and cache evictions. With a generational
>>>>>>>>               
>>>>>>>>                 
>>> collector,
>>>     
>>>       
>>>>>>>> these two are handled by separate parts of the collector.
Per-request
>>>>>>>> garbage should completely fit in the short-term heap (nursery),
so
>>>>>>>>               
>>>>>>>>                 
>>> that
>>>     
>>>       
>>>>> it
>>>>>
>>>>>         
>>>>>           
>>>>>>>> can be collected rapidly and returned to use for further
requests. If
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> the
>>>>>
>>>>>         
>>>>>           
>>>>>>>> nursery is too small, the per-request allocations will be
made in
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> tenured
>>>>>
>>>>>         
>>>>>           
>>>>>>>> space and sit there until the next major GC. Cache evictions
are
>>>>>>>>               
>>>>>>>>                 
>>> almost
>>>     
>>>       
>>>>>>>> always in long-term storage (tenured space) because an LRU
algorithm
>>>>>>>> guarantees that the garbage will be old.
>>>>>>>>
>>>>>>>> Check the growth rate of tenured space (under constant load,
of
>>>>>>>>               
>>>>>>>>                 
>>> course)
>>>     
>>>       
>>>>>>>> while increasing the size of the nursery. That rate should
drop when
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> the
>>>>>
>>>>>         
>>>>>           
>>>>>>>> nursery gets big enough, then not drop much further as it
is
>>>>>>>>               
>>>>>>>>                 
>>> increased
>>>     
>>>       
>>>>> more.
>>>>>
>>>>>         
>>>>>           
>>>>>>>> After that, reduce the size of tenured space until major
GCs start
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> happening
>>>>>
>>>>>         
>>>>>           
>>>>>>>> "too often" (a judgment call). A bigger tenured space means
longer
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> major GCs
>>>>>
>>>>>         
>>>>>           
>>>>>>>> and thus longer pauses, so you don't want it oversized by
too much.
>>>>>>>>
>>>>>>>> Also check the hit rates of your caches. If the hit rate
is low, say
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> 20% or
>>>>>
>>>>>         
>>>>>           
>>>>>>>> less, make that cache much bigger or set it to zero. Either
one will
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> reduce
>>>>>
>>>>>         
>>>>>           
>>>>>>>> the number of cache evictions. If you have an HTTP cache
in front of
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> Solr,
>>>>>
>>>>>         
>>>>>           
>>>>>>>> zero may be the right choice, since the HTTP cache is cherry-picking
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> the
>>>>>
>>>>>         
>>>>>           
>>>>>>>> easily cacheable requests.
>>>>>>>>
>>>>>>>> Note that a commit nearly doubles the memory required, because
you
>>>>>>>>               
>>>>>>>>                 
>>> have
>>>     
>>>       
>>>>> two
>>>>>
>>>>>         
>>>>>           
>>>>>>>> live Searcher objects with all their caches. Make sure you
have
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> headroom for
>>>>>
>>>>>         
>>>>>           
>>>>>>>> a commit.
>>>>>>>>
>>>>>>>> If you want to test the tenured space usage, you must test
with real
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>> world
>>>>>
>>>>>         
>>>>>           
>>>>>>>> queries. Those are the only way to get accurate cache eviction
rates.
>>>>>>>>
>>>>>>>> wunder
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>               
>>>>>>>>                 
>>>>>>> _________________________________________________________________
>>>>>>> Bing™  brings you maps, menus, and reviews organized in one
place.
>>>>>>>             
>>>>>>>               
>>> Try
>>>     
>>>       
>>>>> it now.
>>>>>
>>>>>
>>>>>         
>>>>>           
>>> http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1
>>>     
>>>       
>>>>>>           
>>>>>>             
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>         
>>>>>           
>>>>       
>>>>         
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>>     
>>>       
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




Mime
View raw message