tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejan Wijesinghe <thejan.k.wijesin...@gmail.com>
Subject Re: experiences with Tika in Docker
Date Wed, 31 May 2017 21:40:06 GMT
Hi Tim,

I've used Tika -server in docker but as a single instance only. Yes, its
ability to limit container's resources with related to memory & CPU in the
host machine is great, it gives us so much flexibility, we could enforce
hard/soft memory limits, we could even manipulate the host machine's CPU
cycles. Yes, it also limits risks of executing arbitrary code & XXE
vulnerabilities. I already asked Prof. Chris Mattmann about officially
moving to dockerhub. He said I need to make a mail to apache infra asking
about this. Unfortunately, I still couldn't find a time to make that mail.

We already have multiple dockerfiles in Tika, , dockerfile in tika-server,
InceptionRestDockerfile, InceptionVideoRestDockerfile,
Im2txtRestDockerfile(PR #180-for image captioning).

Part of my GSoC project is to unify the existing REST services such as
object recognition, image captioning. My idea is to unify all of those REST
services where the user can start/terminate, see statistics of any REST
service through a web based GUI. I'm expecting to use a fusion of nginx(as
the reverse proxy server) & docker to make it work. So obviously we will
see docker much often in Tika.

+1 for your thought to looking into hardening the tika-server with the help
of docker.


On Thu, Jun 1, 2017 at 1:03 AM, Allison, Timothy B. <tallison@mitre.org>

> Dave Meikle, Tom and All,
>     How many of us are using Tika in Docker?  If so, how exactly are you
> using it?  Single instance, swarm, Kubernetes, something else?  People fear
> I/O hit with tika-server...what are your experiences?
> I really like the ability to limit the number of CPUs in the Docker
> container.  If a single doc causes multithreaded gc to go nuts, that won't
> kill an entire machine.  This also cleanly limits the risk from XXE or
> arbitrary code execution, right?
> If this is one of the ways of the future for big data, we might want to
> look into hardening tika-server (OOMs, timeouts).  What do you all think?
>         Cheers,
>                 Tim
> Timothy B. Allison, Ph.D.
> Principal Artificial Intelligence Engineer
> Group Lead
> K83E/Human Language Technology
> The MITRE Corporation
> 7515 Colshire Drive, McLean, VA  22102
> 703-983-2473 (phone); 703-983-1379 (fax)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message