tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks
Date Thu, 06 Sep 2018 15:24:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605919#comment-16605919
] 

Tim Allison commented on TIKA-2725:
-----------------------------------

{quote}In this approach, probably it is the only way ...

What is tika-server typical env? stand-alone, distributed ... like replicas in cluster?

Are there some time limitation for recovery? How do we know what point to start processing
from?

Do we mark documents which were processed?

For example, if tika-server had run on Docker swarm/K8S then orchestrator would have restarted
a failed replica itself ...
{quote}

> Make tika-server robust against ooms/infinite loops/memory leaks
> ----------------------------------------------------------------
>
>                 Key: TIKA-2725
>                 URL: https://issues.apache.org/jira/browse/TIKA-2725
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>
> Currently, tika-server is vulnerable to ooms, inifinite loops and memory leaks.  I see
two ways of making it robust:
> 1) use the ForkParser
> 2) have tika-server spawn a child process that actually runs the server, put a watcher
thread in the child that will kill the child on oom/timeout/after x files.  The parent process
can then restart the child if it dies. 
> I somewhat prefer 2) so that we don't have to doubly pass the inputstream.  I propose
2), and I propose making it optional in Tika 1.x, but then the default in Tika 2.x.  We could
also add a status ping from parent to child in case the child gets caught up in stop the world
gc (h/t [~bleskes]).
> Other options/recommendations?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message