lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn Häuser <bjoernhaeu...@gmail.com>
Subject Re: Heap Memory Problem after Upgrading to 7.4.0
Date Mon, 03 Sep 2018 20:37:49 GMT
Hi,


> On 3. Sep 2018, at 22:18, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> Reducing to 10 won't be definitive, but if the problem gets better
> it'll be a clue.
> 
> How are you committing? Is it just based on the solrconfig settings or
> do you have any clients submitting commit commands?

Only through the auto commits, no manual committing from the application.

> 
> One fat clue would be if, in your solr logs, you were getting any
> warnings about "too many on deck searchers" (going from memory here,
> exact wording may differ). That's an indication that your autowarm
> times are taking longer than 20 seconds (your soft commit interval),
> which would point to excessive autowarming being _part_ of the
> problem. This assumes you're indexing steadily.

I searched our logs and could not find any evidence for this. I searched for:

- searchers
- auto
- warmup

There was nothing about too many searchers. Which would mean they are actually leaking and
not too many warming up right?

> 
> Still, though, changing from 6.6 to 7x shouldn't be that much different.
> 
> It's possible that you were running close to your heap limit with 6.6
> and a relatively small difference in heap usage with 7x threw you over
> the tipping point, but that's just hand-waving on my part.
> 

I really thought about this, but in our 6.6. times we had a lot of head from in the young
generation and also very log gc timings.


> And I'm guessing this is a prod system so experiments aren't tolerable…

What do you have in mind? Increasing memory? Thats something we anyway have todo - if it helps.
Our current setup is not very stable anyway, so we have some room for experiments.

> 
> What you can measure. Starting with 6.4 there are about a zillion metrics,
> try: http://host:port/solr/admin/metrics for the complete list and
> pick and choose.
> 
> Note that there are ways to cut down on how much is reported, I
> suspect you'll be interested first in:
> http://localhost:8983/solr/admin/metrics?prefix=SEARCHER
> 
> https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
> 

Funny thing is that we tried to use the prometheus exporter for these metrics, but whenever
we started it it killed our solr node immediately. 

I will try to look into these metrics, but looking at them yields no valuable results for
me. All metrics are “fine”. 

Is there anything special you would take a look at?

> These tend to be on a per-core (replica) basis so you may have to do
> some aggregating.
> 
> Good luck!


Thank you very much :)
Björn

> Erick
> On Mon, Sep 3, 2018 at 12:54 PM Markus Jelsma
> <markus.jelsma@openindex.io> wrote:
>> 
>> Hello,
>> 
>> Getting an OOM plus the fact you are having a lot of IndexSearcher instances rings
a familiar bell. One of our collections has the same issue [1] when we attempted an upgrade
7.2.1 > 7.3.0. I managed to rule out all our custom Solr code but had to keep our Lucene
filters in the schema, the problem persisted.
>> 
>> The odd thing, however, is that you appear to have the same problem, but not with
7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm the problem is not
also in 7.3.0?
>> 
>> You should see the instance count for IndexSearcher increase by one for each replica
on each commit.
>> 
>> Regards,
>> Markus
>> 
>> [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html
>> 
>> 
>> 
>> -----Original message-----
>>> From:Erick Erickson <erickerickson@gmail.com>
>>> Sent: Monday 3rd September 2018 20:49
>>> To: solr-user <solr-user@lucene.apache.org>
>>> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
>>> 
>>> I would expect at least 1 IndexSearcher per replica, how many total
>>> replicas hosted in your JVM?
>>> 
>>> Plus, if you're actively indexing, there may temporarily be 2
>>> IndexSearchers open while the new searcher warms.
>>> 
>>> And there may be quite a few caches, at least queryResultCache and
>>> filterCache and documentCache, one of each per replica and maybe two
>>> (for queryResultCache and filterCache) if you have a background
>>> searcher autowarming.
>>> 
>>> At a glance, your autowarm counts are very high, so it may take some
>>> time to autowarm leading to multiple IndexSearchers and caches open
>>> per replica when you happen to hit a commit point. I usually start
>>> with 16-20 as an autowarm count, the benefit decreases rapidly as you
>>> increase the count.
>>> 
>>> I'm not quite sure why it would be different in 7x .vs. 6x. How much
>>> heap do you allocate to the JVM? And do you see similar heap dumps in
>>> 6.6?
>>> 
>>> Best,
>>> Erick
>>> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser <bjoernhaeuser@gmail.com>
wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each,
4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are running Zookeeper
4.1.13.
>>>> 
>>>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space exhaustion.
After obtaining a heap dump it looks like that we have a lot of IndexSearchers open for our
largest collection.
>>>> 
>>>> The dump contains around ~60 IndexSearchers, and each containing around ~40mb
heap. Another 500MB of heap is the fieldcache, which is expected in my opinion.
>>>> 
>>>> The current config can be found here: https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844
<https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844>
>>>> 
>>>> Analyzing the heap dump eclipse MAT says this:
>>>> 
>>>> Problem Suspect 1
>>>> 
>>>> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader
@ 0x6807d1048" occupy 1.981.148.336 (38,26%) bytes.
>>>> 
>>>> Biggest instances:
>>>> 
>>>>        • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 70.087.272
(1,35%) bytes.
>>>>        • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 65.678.264
(1,27%) bytes.
>>>>        • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 63.050.600
(1,22%) bytes.
>>>> 
>>>> 
>>>> Problem Suspect 2
>>>> 
>>>> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader
@ 0x6807d1048" occupy 1.373.110.208 (26,52%) bytes.
>>>> 
>>>> 
>>>> Any help is appreciated. Thank you very much!
>>>> Björn
>>> 


Mime
View raw message