manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Tika/POI bugs
Date Fri, 27 Jul 2018 14:42:54 GMT
To solve your production problem I highly recommend limiting the size of
the docs fed to Tika, for a start.  But that is no guarantee, I understand.

Out of memory problems are very hard to get good forensics for because they
cause major disruptions to the running server.  You could turn on a degree
of logging so that you can see what documents are being processed at any
time by all threads, but that is pretty verbose.  In your properties.xml
file, add <property name="org.apache.manifoldcf.crawlerthreads"
value="DEBUG"/>.  But I suspect that will generate far too much noise.
Still, it's the best I can offer.

Karl


On Fri, Jul 27, 2018 at 7:52 AM msaunier <msaunier@citya.com> wrote:

> Hi Karl,
>
>
>
> Okay. For the Out of Memory:
>
>
>
> This is the last day that I can go on to find out where the error comes
> from. After that, I should go into production to meet my deadlines.
>
> I hope to find time in the future to be able to fix this problem on this
> server, otherwise I could not index it. Unfortunately, it is very difficult
> to find the documents that cause this error. I did not find any trace in
> the database. Even in debug mode, it is difficult to find the problematic
> document. Maybe if I limit to 1 thread I could find it more easily, but I'm
> afraid the crawl is very long.
>
> Maybe you have an idea of ​​the best method to adopt to find this / these
> documents?
>
>
>
> Maxence
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* vendredi 27 juillet 2018 12:47
> *À :* dev <dev@manifoldcf.apache.org>; user@manifoldcf.apache.org
> *Objet :* Tika/POI bugs
>
>
>
> Hi all,
>
>
>
> I've easily spent 40 hours over the last two weeks chasing down bugs in
> Apache Tika and POI.  The two kinds I see are "ClassNotFound" (due to usage
> of the wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due
> to yet).
>
> I don't have enough time to create tickets directly in Tika for all
> possible documents where these failures occur, so I urge our users to
> create tickets DIRECTLY in the Tika project in Jira.  I guess you can let
> the Tika people create the POI tickets, if need be.  For OutOfMemory
> problems, please attach the file that causes the problem to the ticket, and
> also the amount of memory you gave the agents process.  For ClassNotFound
> problems, also include the stack trace.
>
>
>
> Thanks in advance,
>
> Karlx
>

Mime
View raw message