tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks
Date Tue, 11 Sep 2018 20:48:00 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tim Allison resolved TIKA-2725.
       Resolution: Fixed
    Fix Version/s: 2.0.0

I committed what I'd declare to be an experimental/beta version of this.  The legacy tika-server
behavior is untouched.  To trigger the new version, add {{-spawnChild}} to the commandline.

Still to be done, IMHO, before I forget...

1) Update the wiki.

2) Clean up the shutdown procedure – if there's a parse timeout, we should allow x milliseconds
for the other parses to complete before killing the server (and restarting!)

3) Clean up the thread in the watchdog that checks for ping timeouts.  We need to lock/synchronize
to ensure that the child process is not null when the timeout goes to kill the child.

4) Add maxRestarts as a parameter.

5) Gracefully handle failed jvm start-ups.

6) Figure out how to add an interceptor for "not available" if the child is in the process
of shutting down – get rid of {{checkIsOperating()}}.

7) Figure out why 9998 is still taken by the other unit tests.

Finally, many thanks to [~jukkaz] and the ForkParser, from which I plagiarized quite a bit.
:)  Problems, are, of course, my own.

> Make tika-server robust against ooms/infinite loops/memory leaks
> ----------------------------------------------------------------
>                 Key: TIKA-2725
>                 URL: https://issues.apache.org/jira/browse/TIKA-2725
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.19, 2.0.0
> Currently, tika-server is vulnerable to ooms, inifinite loops and memory leaks.  I see
two ways of making it robust:
> 1) use the ForkParser
> 2) have tika-server spawn a child process that actually runs the server, put a watcher
thread in the child that will kill the child on oom/timeout/after x files.  The parent process
can then restart the child if it dies. 
> I somewhat prefer 2) so that we don't have to doubly pass the inputstream.  I propose
2), and I propose making it optional in Tika 1.x, but then the default in Tika 2.x.  We could
also add a status ping from parent to child in case the child gets caught up in stop the world
gc (h/t [~bleskes]).
> Other options/recommendations?

This message was sent by Atlassian JIRA

View raw message