manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: agents process ran out of memory
Date Wed, 15 Apr 2015 16:20:03 GMT
If you can find a specific file that causes Tika to either run out of stack
or use huge quantities of memory, it would be great to include it (if
possible) in a TIKA jira ticket.  We'd need a stack trace, of course,
showing that Tika is responsible.

Thanks,
Karl


On Wed, Apr 15, 2015 at 11:51 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl> wrote:

> On Wed, Apr 15, 2015 at 11:16:44AM -0400, Karl Wright wrote:
> > Hi Kamil,
> >
> > I bet that it is one specific file that was causing the problem.  By
> > increasing the stack space, you allowed the file to be processed.  Now it
> > won't get processed again until it changes.
> >
> > My thought is that this is *probably* related to Tika.  Are you using the
> > Tika transformer?
>
> yes, I use Tika transformation and I think this is related to Tika too but
> don't
> know which file cause the problem. I have two identical jobs (one for
> continuous crawl
> and one for deletion), these jobs report diffrent documents count and only
> continuous job cause regex errors.
>
> Another job give me "agents process ran out of memory - shutting down" but
> this is related to Tika too. Excluded one file and now is working.
>
> K
>
> >
> >
> > On Wed, Apr 15, 2015 at 9:11 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
> wrote:
> >
> > > I stopped all agents, removed all logs, add '-Xss500m' to options file,
> > > started agents and errors are gone. Now I removed '-Xss500m' from
> options
> > > to trap the source of the problem, restart all agents and still no
> errors.
> > >
> > > *magic*
> > >
> > > Thx Karl for you patience and my weird problems.
> > >
> > > K
> > >
> > > On Wed, Apr 15, 2015 at 08:39:52AM -0400, Karl Wright wrote:
> > > > Hi Kamil,
> > > >
> > > > I believe your logs are probably "rolling".  This means that when
> the log
> > > > gets full, or another day starts, a new log file starts.  I don't
> know,
> > > of
> > > > course, because I did not configure your system.
> > > >
> > > > What I *do* know is that the stack trace that you are providing me is
> > > > incomplete, and while it is clear that the Java regular expression
> parser
> > > > is failing in some way (by doing infinite recursion), I have no idea
> what
> > > > *context* this is occurring in, without the end of that stack trace.
> > > >
> > > > This may be occurring almost anywhere, which is why I need the trace.
> > > Even
> > > > String.replace() and String.split() use regexps and can be at fault.
> > > > Without a definitive source, there's little I can do.
> > > >
> > > > One thing you can certainly try is to provide a larger amount of
> stack
> > > > space to the JVM and just hope the problem goes away.  That would
> mean
> > > > editing one of the options files and adding a parameter:
> > > >
> > > > -Xss500m
> > > >
> > > > (for instance)
> > > >
> > > > If you would rather get to the source of the problem, I suggest the
> > > > following:
> > > >
> > > > (1) Shut down all agents processes
> > > > (2) Remove all logs
> > > > (3) Start the agents process
> > > > (4) Tail the log looking for "FATAL": tail -f manifoldcf.log | grep
> FATAL
> > > > (5) As soon as you see that, shut down the agents process
> > > > (6) Look at the log file produced
> > > >
> > > > References:
> > > >
> > >
> http://stackoverflow.com/questions/7509905/java-lang-stackoverflowerror-while-using-a-regex-to-parse-big-strings
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Wed, Apr 15, 2015 at 8:28 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
> > > wrote:
> > > >
> > > > > # java -version
> > > > > java version "1.8.0_45"
> > > > > Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> > > > > Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
> > > > >
> > > > > it's broken? I don't know. How can I prevend rolling backtrace?
> > > > > It's look like infinity loop for me.
> > > > >
> > > > > K
> > > > >
> > > > > On Wed, Apr 15, 2015 at 07:41:37AM -0400, Karl Wright wrote:
> > > > > > Clearly the logs must have rolled then?  Either that or you
are
> > > using a
> > > > > > broken jdk.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 15, 2015 at 7:37 AM, Kamil Żyta <
> kamil.zyta@pwr.edu.pl>
> > > > > wrote:
> > > > > >
> > > > > > > On Wed, Apr 15, 2015 at 07:27:56AM -0400, Karl Wright wrote:
> > > > > > > > Hi Kamil:
> > > > > > > >
> > > > > > > > kawright@duck76:/data/kawright/analysis$ gzip --version
> > > > > > > > gzip 1.4
> > > > > > > > Copyright (C) 2007 Free Software Foundation, Inc.
> > > > > > > > Copyright (C) 1993 Jean-loup Gailly.
> > > > > > > > This is free software.  You may redistribute copies
of it
> under
> > > the
> > > > > > > terms of
> > > > > > > > the GNU General Public License <
> > > http://www.gnu.org/licenses/gpl.html
> > > > > >.
> > > > > > > > There is NO WARRANTY, to the extent permitted by law.
> > > > > > > >
> > > > > > > > Written by Jean-loup Gailly.
> > > > > > > > kawright@duck76:/data/kawright/analysis$
> > > > > > > >
> > > > > > > >
> > > > > > > > But in any case the key part of the stack trace is
further
> down,
> > > > > probably
> > > > > > > > MUCH further down.
> > > > > > > >
> > > > > > > > If I were you, I'd unzip the whole log and use head,
tail,
> and
> > > grep
> > > > > to
> > > > > > > find
> > > > > > > > where the exception trace ends.
> > > > > > >
> > > > > > > I use grep -v and send you logs before but you don't belive
me.
> > > > > > > It's all mcf logs http://pastebin.com/T54NKwTh
> > > > > > > http://pastebin.com/uMxaUnGi
> > > > > > >
> > > > > > > K
> > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Apr 15, 2015 at 7:18 AM, Kamil Żyta <
> > > kamil.zyta@pwr.edu.pl>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > hmm, try tar -xf manifoldcf.log.gz or maybe zless?
> > > > > > > > > It's work for me with:
> > > > > > > > > > gzip --version
> > > > > > > > > gzip 1.6
> > > > > > > > >
> > > > > > > > > For sure I attached uncompressed file.
> > > > > > > > >
> > > > > > > > > K
> > > > > > > > >
> > > > > > > > > On Wed, Apr 15, 2015 at 07:10:07AM -0400, Karl
Wright
> wrote:
> > > > > > > > > > Hi Kamil,
> > > > > > > > > >
> > > > > > > > > > >>>>>>
> > > > > > > > > > kawright@duck76:~$ cd /data/kawright/analysis/
> > > > > > > > > > kawright@duck76:/data/kawright/analysis$
gunzip
> > > > > manifoldcf.log.gz
> > > > > > > > > >
> > > > > > > > > > gzip: manifoldcf.log.gz: invalid compressed
data--crc
> error
> > > > > > > > > >
> > > > > > > > > > gzip: manifoldcf.log.gz: invalid compressed
data--length
> > > error
> > > > > > > > > > kawright@duck76:/data/kawright/analysis$
> > > > > > > > > >
> > > > > > > > > > <<<<<<
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 15, 2015 at 6:41 AM, Kamil Żyta
<
> > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > these 1k lines are the same. I attached
full
> > > manifoldcf.log.
> > > > > > > > > > >
> > > > > > > > > > > K
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 15, 2015 at 06:33:06AM
-0400, Karl Wright
> > > wrote:
> > > > > > > > > > > > Hi Kamil,
> > > > > > > > > > > >
> > > > > > > > > > > > There is a complete trace in there,
believe me.  The
> JVM
> > > did
> > > > > not
> > > > > > > > > say: "
> > > > > > > > > > > (...)
> > > > > > > > > > > > ~1k lines".  What I need is at
the bottom of those 1K
> > > lines.
> > > > > > > > > > > >
> > > > > > > > > > > > Karl
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 15, 2015 at 6:23 AM,
Kamil Żyta <
> > > > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > How can I provide usable
stack trace? I can only
> copy
> > > what
> > > > > logs
> > > > > > > > > says.
> > > > > > > > > > > > > Now it's a lot of:
> > > > > > > > > > > > > FATAL 2015-04-15 12:14:35,645
(Worker thread '5') -
> > > Error
> > > > > > > tossed:
> > > > > > > > > null
> > > > > > > > > > > > > java.lang.StackOverflowError
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > java.util.regex.Pattern$CharProperty.match(Pattern.java:3776)
> > > > > > > > > > > > >         at
> > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4250)
> > > > > > > > > > > > >         at
> > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > >         (...) ~1k lines
> > > > > > > > > > > > >
> > > > > > > > > > > > > for continuous job but agents
is not exiting.
> Propably
> > > > > this two
> > > > > > > > > errors
> > > > > > > > > > > > > below isn't correlated (patterns
and agents oom).
> > > > > > > > > > > > >
> > > > > > > > > > > > > K
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Apr 14, 2015 at 05:28:18PM
-0400, Karl
> Wright
> > > > > wrote:
> > > > > > > > > > > > > > Without some kind of
usable stack trace I can't
> > > really
> > > > > help
> > > > > > > > > you.  It
> > > > > > > > > > > > > looks
> > > > > > > > > > > > > > like some regular expression
is going completely
> > > haywire,
> > > > > > > but I
> > > > > > > > > have
> > > > > > > > > > > no
> > > > > > > > > > > > > > idea which one.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Apr 14, 2015
at 4:31 PM, Kamil Żyta <
> > > > > > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Apr 14,
2015 at 04:12:55PM -0400, Karl
> > > Wright
> > > > > > > wrote:
> > > > > > > > > > > > > > > > Hi Kamil,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Without the
bottom of the stack trace, I
> can't
> > > even
> > > > > tell
> > > > > > > > > what it
> > > > > > > > > > > is
> > > > > > > > > > > > > > > doing.
> > > > > > > > > > > > > > > > Where are
you supplying a regular expression?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It's all I have,
the only regular expression
> is in
> > > > > 'Paths':
> > > > > > > > > > > > > > > 3. Exclude file(s)
or directory(s) matching
> */.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I found files (~500MB,
logs) where solr logs
> ends,
> > > > > > > > > > > > > > > exclude them solves
the problem. mcf use tika
> for
> > > > > > > extracting
> > > > > > > > > > > > > > > and only /update
to solr, these files causes
> > > problem
> > > > > befor,
> > > > > > > > > > > > > > > when using solr
for extract docs. Now mcf dies
> and
> > > I
> > > > > do not
> > > > > > > > > even
> > > > > > > > > > > know
> > > > > > > > > > > > > why.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > K
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Running out
of memory might be a side effect
> of
> > > > > running
> > > > > > > out
> > > > > > > > > of
> > > > > > > > > > > stack.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Karl
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Apr
14, 2015 at 2:49 PM, Kamil Żyta <
> > > > > > > > > > > kamil.zyta@pwr.edu.pl>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > agent
process exit with:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > agents
process ran out of memory - shutting
> > > down
> > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError:
Java heap space
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > java.util.Arrays.copyOfRange(Arrays.java:3664)
> > > > > > > > > > > > > > > > >     
   at
> > > java.lang.String.<init>(String.java:201)
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > >
> java.lang.StringBuilder.toString(StringBuilder.java:407)
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.buildSolrDocument(HttpPoster.java:987)
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:882)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > workers
threads:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > FATAL
2015-04-14 18:59:11,172 (Worker
> thread
> > > '32')
> > > > > -
> > > > > > > Error
> > > > > > > > > > > tossed:
> > > > > > > > > > > > > null
> > > > > > > > > > > > > > > > > java.lang.StackOverflowError
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > > > > >
> > > > > > > java.util.regex.Pattern$CharProperty.match(Pattern.java:3776)
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4250)
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > > >     
   (...) ~1k lines
> > > > > > > > > > > > > > > > >     
   at
> > > > > > > > > > > java.util.regex.Pattern$Curly.match0(Pattern.java:4263)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > no errors/warns
in solr logs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > it's
bug or just corrupted file?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > K
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>

Mime
View raw message