From Mark Miller <>
Subject Re: Solr and Garbage Collection
Date Fri, 02 Oct 2009 14:14:51 GMT
siping liu wrote:
> Hi,
> I read pretty much all posts on this thread (before and after this one). Looks like the
main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as
long as you don't see OOM exception). This brings more questions than answers (for me at least.
I'm new to Solr).
> First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about
2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting
is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS
and quickly run into the problem similar to the one orignal poster reported -- long pause
(seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS
get added: "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
-XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking is with mutile-cpu/cores
we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat
reaches heap size, then it blocks and takes minutes on "full GC" to get more space from "tenure
generation". We tried different Xmx (from very small to large), no difference in long GC time.
We never run into OOM.
MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
the Parallel collector. That also doesnt look like a good survivorratio.
> Questions:
> * In general various cachings are good for performance, we have more RAM to use and want
to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going
against that?
Leaving RAM for the FileSystem cache is also very important. But you
should also have enough RAM for your Solr caches of course.
> * Looks like Solr caching made its way into tenure-generation on heap, that's good. But
why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and
see a single instance of using WeakReference. Is that what is causing all this? This seems
to suggest a design flaw in Solr's memory management strategy (or just my ignorance about
Solr?). I mean, wouldn't this be the "right" way of doing it -- you allow user to specify
the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly,
and no need to use WeakReference (BTW, why not SoftReference)??
Do you see concurrent mode failure when looking at your gc logs? ie:

174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.0000618
secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K),
4.0975124 secs] 228336K->162118K(241520K)

That means you have still getting major collections with CMS, and you
don't want that. You might try kicking GC off earlier with something
like: -XX:CMSInitiatingOccupancyFraction=50
> * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's
better to have Solr on its own Tomcat, given that it's tricky to adjust the java options.
> thanks.
>> From:
>> To:
>> Subject: RE: Solr and Garbage Collection
>> Date: Fri, 25 Sep 2009 09:51:29 -0700
>> 30ms is not better or worse than 1s until you look at the service
>> requirements. For many applications, it is worth dedicating 10% of your
>> processing time to GC if that makes the worst-case pause short.
>> On the other hand, my experience with the IBM JVM was that the maximum query
>> rate was 2-3X better with the concurrent generational GC compared to any of
>> their other GC algorithms, so we got the best throughput along with the
>> shortest pauses.
>> Solr garbage generation (for queries) seems to have two major components:
>> per-request garbage and cache evictions. With a generational collector,
>> these two are handled by separate parts of the collector. Per-request
>> garbage should completely fit in the short-term heap (nursery), so that it
>> can be collected rapidly and returned to use for further requests. If the
>> nursery is too small, the per-request allocations will be made in tenured
>> space and sit there until the next major GC. Cache evictions are almost
>> always in long-term storage (tenured space) because an LRU algorithm
>> guarantees that the garbage will be old.
>> Check the growth rate of tenured space (under constant load, of course)
>> while increasing the size of the nursery. That rate should drop when the
>> nursery gets big enough, then not drop much further as it is increased more.
>> After that, reduce the size of tenured space until major GCs start happening
>> "too often" (a judgment call). A bigger tenured space means longer major GCs
>> and thus longer pauses, so you don't want it oversized by too much.
>> Also check the hit rates of your caches. If the hit rate is low, say 20% or
>> less, make that cache much bigger or set it to zero. Either one will reduce
>> the number of cache evictions. If you have an HTTP cache in front of Solr,
>> zero may be the right choice, since the HTTP cache is cherry-picking the
>> easily cacheable requests.
>> Note that a commit nearly doubles the memory required, because you have two
>> live Searcher objects with all their caches. Make sure you have headroom for
>> a commit.
>> If you want to test the tenured space usage, you must test with real world
>> queries. Those are the only way to get accurate cache eviction rates.
>> wunder
> _________________________________________________________________
_________________________________________________________________

- Mark

