tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Tikhonov <o...@apache.org>
Subject Re: [jira] [Commented] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks
Date Thu, 06 Sep 2018 14:15:31 GMT
In this approach, probably it is the only way ...
What is tika-server typical env? stand-alone, distributed ... like replicas
in cluster?
Are there some time limitation for recovery? How do we know what point to
start processing from?
Do we mark documents which were processed?
For example, if tika-server had run on Docker swarm/K8S then orchestrator
would have restarted a failed replica itself ...


On Thu, Sep 6, 2018 at 4:58 PM Tim Allison (JIRA) <jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605816#comment-16605816
> ]
>
> Tim Allison commented on TIKA-2725:
> -----------------------------------
>
> From [~oleg@apache.org] on the dev list:
>
> bq. What if watcher thread fails/gets stuck etc?
>
> To confirm, that's the watcher thread in the child process.  Y, that's why
> I think we should also have a ping from the parent process.  WDYT?
>
> > Make tika-server robust against ooms/infinite loops/memory leaks
> > ----------------------------------------------------------------
> >
> >                 Key: TIKA-2725
> >                 URL: https://issues.apache.org/jira/browse/TIKA-2725
> >             Project: Tika
> >          Issue Type: Task
> >            Reporter: Tim Allison
> >            Assignee: Tim Allison
> >            Priority: Major
> >
> > Currently, tika-server is vulnerable to ooms, inifinite loops and memory
> leaks.  I see two ways of making it robust:
> > 1) use the ForkParser
> > 2) have tika-server spawn a child process that actually runs the server,
> put a watcher thread in the child that will kill the child on
> oom/timeout/after x files.  The parent process can then restart the child
> if it dies.
> > I somewhat prefer 2) so that we don't have to doubly pass the
> inputstream.  I propose 2), and I propose making it optional in Tika 1.x,
> but then the default in Tika 2.x.  We could also add a status ping from
> parent to child in case the child gets caught up in stop the world gc (h/t
> [~bleskes]).
> > Other options/recommendations?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message