lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: OOM killer script woes
Date Tue, 02 Jul 2013 15:36:06 GMT
Please file a JIRA issue so that we can address this.

- Mark

On Jul 2, 2013, at 6:20 AM, Daniel Collins <danwcollins@gmail.com> wrote:

> On looking at the code in SolrDispatchFilter, is this intentional or not?
> I think I remember Mark Miller mentioning that in an OOM case, the best
> course of action is basically to kill the process, there is very little
> Solr can do once it has run out of memory.  Yet it seems that Solr catches
> the OOM itself and just logs it as an error, rather than letting it go back
> up the to the JVM.
> 
> We have also seem OOMs in IndexWriter and that has specific code to handle
> OOM cases, and seems to fall-back to the transaction log (but fail
> committing anything).  I understand the logic of that, but in reality, I've
> seen the tlog can get corrupted in this case, so we still need to be
> monitoring the system and forcibly kill the process.
> 
> 
> 
> On 27 June 2013 00:03, Timothy Potter <thelabdude@gmail.com> wrote:
> 
>> Thanks for the feedback Daniel ... For now, I've opted to just kill
>> the JVM with System.exit(1) in the SolrDispatchFilter code and will
>> restart it with a Linux supervisor. Not elegant but the alternative of
>> having a zombie Solr instance walking around my cluster is much worse
>> ;-) Will try to dig into the code that is trapping this error but for
>> now I've lost too many hours on this problem.
>> 
>> Cheers,
>> Tim
>> 
>> On Wed, Jun 26, 2013 at 2:43 PM, Daniel Collins <danwcollins@gmail.com>
>> wrote:
>>> Ooh, I guess Jetty is trapping that java.lang.OutOfMemoryError, and
>>> throwing it/packaging it as a java.lang.RuntimeException.  The -XX option
>>> assumes that the application doesn't handle the Errors and so they would
>>> reach the JVM and thus invoke the handler.
>>> Since Jetty has an exception handler that is dealing with anything
>>> (included Errors), they never reach the JVM, hence no handler.
>>> 
>>> Not much we can do short of not using Jetty?
>>> 
>>> That's a pain, I'd just written a nice OOM handler too!
>>> 
>>> 
>>> On 26 June 2013 20:37, Timothy Potter <thelabdude@gmail.com> wrote:
>>> 
>>>> A little more to this ...
>>>> 
>>>> Just on chance this was a weird Jetty issue or something, I tried with
>>>> the latest 9.... and the problem still occurs :-(
>>>> 
>>>> This is on Java 7 on debian:
>>>> 
>>>> java version "1.7.0_21"
>>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>>> 
>>>> Here is an example stack trace from the log
>>>> 
>>>> 2013-06-26 19:31:33,801 [qtp632640515-62] ERROR
>>>> solr.servlet.SolrDispatchFilter Q:22 -
>>>> null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
>>>> space
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
>>>> at
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
>>>> at
>>>> 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
>>>> at
>>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
>>>> at
>>>> 
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
>>>> at
>>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
>>>> at
>>>> 
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>>>> at org.eclipse.jetty.server.Server.handle(Server.java:445)
>>>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260)
>>>> at
>>>> 
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225)
>>>> at
>>>> 
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
>>>> at
>>>> 
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596)
>>>> at
>>>> 
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527)
>>>> at java.lang.Thread.run(Thread.java:722)
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> 
>>>> On Wed, Jun 26, 2013 at 12:27 PM, Timothy Potter <thelabdude@gmail.com>
>>>> wrote:
>>>>> Recently upgraded to 4.3.1 but this problem has persisted for a while
>>>> now ...
>>>>> 
>>>>> I'm using the following configuration when starting Jetty:
>>>>> 
>>>>> -XX:OnOutOfMemoryError="/home/solr/oom_killer.sh 83 %p"
>>>>> 
>>>>> If an OOM is triggered during Solr web app initialization (such as by
>>>>> me lowering -Xmx to a value that is too low to initialize Solr with),
>>>>> then the script gets called and does what I expect!
>>>>> 
>>>>> However, once the Solr webapp initializes and Solr is happily
>>>>> responding to updates and queries. When an OOM occurs in this
>>>>> situation, then the script doesn't actually get invoked! All I see is
>>>>> the following in the stdout/stderr log of my process:
>>>>> 
>>>>> #
>>>>> # java.lang.OutOfMemoryError: Java heap space
>>>>> # -XX:OnOutOfMemoryError="/home/solr/oom_killer.sh 83 %p"
>>>>> #   Executing /bin/sh -c "/home/solr/oom_killer.sh 83 21358"...
>>>>> 
>>>>> The oom_killer.sh script doesn't actually get called!
>>>>> 
>>>>> So to recap, it works if an OOM occurs during initialization but once
>>>>> Solr is running, the OOM killer doesn't fire correctly. This leads me
>>>>> to believe my script is fine and there's something else going wrong.
>>>>> Here's the oom_killer.sh script (pretty basic):
>>>>> 
>>>>> #!/bin/bash
>>>>> SOLR_PORT=$1
>>>>> SOLR_PID=$2
>>>>> NOW=$(date +"%Y%m%d_%H%M")
>>>>> (
>>>>> echo "Running OOM killer script for process $SOLR_PID for Solr on port
>>>>> 89$SOLR_PORT"
>>>>> kill -9 $SOLR_PID
>>>>> echo "Killed process $SOLR_PID"
>>>>> exec /home/solr/solr-dg/dg-solr.sh recover $SOLR_PORT &
>>>>> echo "Restarted Solr on 89$SOLR_PORT after OOM"
>>>>> ) | tee oom_killer-89$SOLR_PORT-$NOW.log
>>>>> 
>>>>> Anyone see anything like this before? Suggestions on where to begin
>>>>> tracking down this issue?
>>>>> 
>>>>> Cheers,
>>>>> Tim
>>>> 
>> 


Mime
View raw message