lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Salman Akram <salman.ak...@northbaysolutions.net>
Subject Re: Recovering from Out of Mem
Date Fri, 17 Oct 2014 07:11:52 GMT
I know this might sound weird but any easy way to do it in Windows?

On Tue, Oct 14, 2014 at 7:51 PM, Boogie Shafer <Boogie.Shafer@proquest.com>
wrote:

> yago,
>
> you can put more complex restart logic as shown in the examples below or
> just do something similar to the java_oom.sh i posted earlier where you
> just spit out an email alert and deal with service restarts and
> troubleshooting manually
>
>
> e.g. something like the following for a java_error.sh will drop an email
> with a timestamp
>
>
>
> echo `date` | mail -s "Java Error: General - $HOSTNAME" notify@domain.com
>
>
> ________________________________________
> From: Tim Potter <tim.potter@lucidworks.com>
> Sent: Tuesday, October 14, 2014 07:35
> To: solr-user@lucene.apache.org
> Subject: Re: Recovering from Out of Mem
>
> jfyi - the bin/solr script does the following:
>
> -XX:OnOutOfMemoryError="$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT" where
> $SOLR_PORT is the port Solr is bound to, e.g. 8983
>
> The oom_solr.sh script looks like:
>
> SOLR_PORT=$1
>
> SOLR_PID=`ps waux | grep start.jar | grep $SOLR_PORT | grep -v grep | awk
> '{print $2}' | sort -r`
>
> if [ "$SOLR_PID" == "" ]; then
>
>   echo "Couldn't find Solr process running on port $SOLR_PORT!"
>
>   exit
>
> fi
>
> NOW=$(date +"%F%T")
>
> (
>
> echo "Running OOM killer script for process $SOLR_PID for Solr on port
> $SOLR_PORT"
>
> kill -9 $SOLR_PID
>
> echo "Killed process $SOLR_PID"
>
> ) | tee solr_oom_killer-$SOLR_PORT-$NOW.log
>
>
> I usually run Solr behind a supervisor type process (supervisord or
> upstart) that will restart it if the process dies.
>
>
> On Tue, Oct 14, 2014 at 8:09 AM, Markus Jelsma <markus@openindex.io>
> wrote:
>
> > This will do:
> > kill -9 `ps aux | grep -v grep | grep tomcat6 | awk '{print $2}'`
> >
> > pkill should also work
> >
> > On Tuesday 14 October 2014 07:02:03 Yago Riveiro wrote:
> > > Boogie,
> > >
> > >
> > >
> > >
> > > Any example for java_error.sh script?
> > >
> > >
> > > —
> > > /Yago Riveiro
> > >
> > > On Tue, Oct 14, 2014 at 2:48 PM, Boogie Shafer <
> > Boogie.Shafer@proquest.com>
> > >
> > > wrote:
> > > > a really simple approach is to have the OOM generate an email
> > > > e.g.
> > > > 1) create a simple script (call it java_oom.sh) and drop it in your
> > tomcat
> > > > bin dir echo `date` | mail -s "Java Error: OutOfMemory - $HOSTNAME"
> > > > notify@domain.com 2) configure your java options (in setenv.sh or
> > > > similar) to trigger heap dump and the email script when OOM occurs #
> > > > config error behaviors
> > > > CATALINA_OPTS="$CATALINA_OPTS -XX:+HeapDumpOnOutOfMemoryError
> > > > -XX:HeapDumpPath=$TOMCAT_DIR/temp/tomcat-dump.hprof
> > > > -XX:OnError=$TOMCAT_DIR/bin/java_error.sh
> > > > -XX:OnOutOfMemoryError=$TOMCAT_DIR/bin/java_oom.sh
> > > > -XX:ErrorFile=$TOMCAT_DIR/temp/java_error%p.log"
> > > > ________________________________________
> > > > From: Mark Miller <markrmiller@gmail.com>
> > > > Sent: Tuesday, October 14, 2014 06:30
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Recovering from Out of Mem
> > > > Best is to pass the Java cmd line option that kills the process on
> OOM
> > and
> > > > setup a supervisor on the process to restart it.  You need a somewhat
> > > > recent release for this to work properly though. - Mark
> > > >
> > > >> On Oct 14, 2014, at 9:06 AM, Salman Akram
> > > >> <salman.akram@northbaysolutions.net> wrote:
> > > >>
> > > >> I know there are some suggestions to avoid OOM issue e.g. setting
> > > >> appropriate Max Heap size etc. However, what's the best way to
> recover
> > > >> from
> > > >> it as it goes into non-responding state? We are using Tomcat on back
> > end.
> > > >>
> > > >> The scenario is that once we face OOM issue it keeps on taking
> queries
> > > >> (doesn't give any error) but they just time out. So even though we
> > have a
> > > >> fail over system implemented but we don't have a way to distinguish
> if
> > > >> these are real time out queries OR due to OOM.
> > > >>
> > > >> --
> > > >> Regards,
> > > >>
> > > >> Salman Akram
> >
> >
>



-- 
Regards,

Salman Akram

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message