tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks
Date Thu, 06 Sep 2018 20:47:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606373#comment-16606373
] 

Tim Allison commented on TIKA-2725:
-----------------------------------

{quote}
Ideally, tika server is dockerized, runs on swarm as a service. In
addition, it has healthckeck mechanism, say something ... like http get
request with return code 200. Docker will runs this hc periodically, and if
it fails, will restart tika server.
However, we are far away. Two ways to go, fmpov ... 1. Your second option
or ... os deamon which will check tika server availability or something
like that. We can use cron on Linux to run our "healthcheck" and if it
detects some anomalies, will restart a server. Probably for windows we can
find such mecanism as well.
{quote}

CommonsExec?

> Make tika-server robust against ooms/infinite loops/memory leaks
> ----------------------------------------------------------------
>
>                 Key: TIKA-2725
>                 URL: https://issues.apache.org/jira/browse/TIKA-2725
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>
> Currently, tika-server is vulnerable to ooms, inifinite loops and memory leaks.  I see
two ways of making it robust:
> 1) use the ForkParser
> 2) have tika-server spawn a child process that actually runs the server, put a watcher
thread in the child that will kill the child on oom/timeout/after x files.  The parent process
can then restart the child if it dies. 
> I somewhat prefer 2) so that we don't have to doubly pass the inputstream.  I propose
2), and I propose making it optional in Tika 1.x, but then the default in Tika 2.x.  We could
also add a status ping from parent to child in case the child gets caught up in stop the world
gc (h/t [~bleskes]).
> Other options/recommendations?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message