lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Chen <Tim.C...@sbs.com.au>
Subject RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory
Date Mon, 08 Aug 2016 00:53:01 GMT
Hi Erick, Shawn,

Thanks for following this up.

1,
For some reason, ramBufferSizeMB in our solrconfig.xml is not set to 100MB, but 32MB.

In that case, considering we have 10G for JVM, my understanding is we should not run out of
memory due to large number of documents being added to Solr.

Just to make sure I understand it correctly, the documents adding to Solr will be stored in
an internal queue in Solr, and Solr will only use that 32MB (or 99% of 32M + one extra document
memory) for indexing documents. The documents in the queue will be indexed one by one.

2,
Based on our tomcat (Solr) access_log and website peak hours, the time we had our cluster
failure is not likely because of _searching_traffic. Eg, we can see much more Solr requests
with 'update' keyword, but as usual number of requests with 'select' keyword.

3,
Now, this leads me to the only reason I can think of: (you mentioned this earlier as well):
Since each Shard has 4 replicas in our setup, when there are large number of documents being
add, the Leader will create a lot of threads to send the document to other replica servers.
All these threads are the one consumed all the memory on Leader server, and leads to OOM.

If my assumption was right, to try or fix this issue, is to:
a): still need to limit the documents being add to Solr
b): change to 2 replica for each shard (loss of data reliability, but..)
c): bump up server memory.

Am I going the right way? Any advice and suggestions are much appreciated!!

Also attached part of catalina.out OOM log for reference:

Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: unable to create
new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:714)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
        at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6861" java.lang.OutOfMemoryError: unable to create
new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:714)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
        at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6671" java.lang.OutOfMemoryError: unable to create
new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:714)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
        at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)

Many thanks,
Tim


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Saturday, 6 August 2016 2:31 AM
To: solr-user
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of RAM consumed,
when adding a doc if that limit is exceeded then the buffer is flushed.

So you can reduce that number, but it's default is 100M and if you're running that close to
your limits I suspect you'd get, at best, a bit more runway before you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures check if the limit is exceeded
> and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a ginormous doc your
in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit more runway" what I'm thinking
about is that when you start _searching_ the data your memory requirements will continue to
grow and you'll be back where you started.

And just as a sanity check: You didn't perchance increase the maxWarmingSearchers parameter
in solrconfig.xml, did you? If so, that's really a red flag.

Best,
Erick

On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen <Tim.Chen@sbs.com.au> wrote:
> Thanks Guys. Very very helpful.
>
> I will probably look at consolidate 4 Solr servers into 2 bigger/better server - it gives
more memory, and it cut down the replica the Leader needs to manage.
>
> Also, I may look into write a script to monitor the tomcat log and if there is OOM, kill
tomcat, then restart it. A bit dirty, but may work for a short term.
>
> I don't know too much about how documents indexed, and how to save memory from that.
Will probably work with a developer on this as well.
>
> Many Thanks guys.
>
> Cheers,
> Tim
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Friday, 5 August 2016 4:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader
> out of memory
>
> On 8/4/2016 8:14 PM, Tim Chen wrote:
>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>> like dead down, so other servers can do the election and choose the
>> new leader. This at least avoids bringing down the whole cluster. Am
>> I right?
>
> Supplementing what Erick told you:
>
> When a typical Java program throws OutOfMemoryError, program behavior is completely unpredictable.
 There are programming techniques that can be used so that behavior IS predictable, but writing
that code can be challenging.
>
> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java option to
execute a script when OutOfMemoryError happens.  This script kills Solr completely.  We are
working on adding this capability when running on Windows.
>
>> 2, Apparently we should not pushing too many documents to Solr, how
>> do you guys handle this? Set a limit somewhere?
>
> There are exactly two ways to deal with OOME problems: Increase the heap or reduce Solr's
memory requirements.  The number of documents you push to Solr is unlikely to have a large
effect on the amount of memory that Solr requires.  Here's some information on this topic:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>
>
> [Premiere League Starts Saturday 13 August 9.30pm on
> SBS]<http://theworldgame.sbs.com.au/>


[Premiere League Starts Saturday 13 August 9.30pm on SBS]<http://theworldgame.sbs.com.au/>
Mime
View raw message