tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Pugh <ep...@opensourceconnections.com>
Subject Do we have a community supported approach for deploying Tika Server in production?
Date Wed, 04 Dec 2019 17:24:22 GMT
Hi all - Hoping this is a reasonable Tika-dev versus Tika-user question!

Over in Solr land there has been renewed discussion about streamlining what Solr is....  

In regards to rich content extraction and the Tika project, it seems like the two ideas that
continue to preserve the existing behavior are:

1) To convert the ExtractingRequestHandler into a Package (Plugin) for Solr.   This slims
down the standard Solr download, and *might* make it easier to update the version of Tika
+ dependent jars used?

2) The second approach is to instead require Tika-Server to be running (https://issues.apache.org/jira/browse/SOLR-7633)
and just have Solr delegate the call to Tika-Server.

I was thinking about why I like option 1 better than 2, and I think it boils down to how mature
the IT organization I am working with is.  Some IT organizations have large dev-ops teams,
and are working at major scale, and managing a fleet of Tika-Server on Kubernetes with Load
Balancer dynamically scaling up and down is simple and second nature!  However, many organizations
aren’t like that.

So I guess what I’m asking is do we have a reasonable supported approach for deploying Tika
Server for non-tika savvy organizations?   I’m thinking about Solr, and specifically the
fact that Solr has a well defined set of Service Installation scripts.   When I follow the
directions in https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
I can feel confident that when the server is rebooted, then Solr will come back up!   Plus
there is log rotation and all the rest.

In contrast, when I look at Tika website, specifically https://tika.apache.org/1.22/gettingstarted.htm
pagel, the message is to run Tika as a command line application, or embedded in your application.

I’m wondering if Tika-Server needs to be made more prominent, and treated as the “primary
method of interacting with Tika”?   Do we need as a community to focus more on Tika-Server?
  In our getting started documentation, in our usage documentation, and in our examples?

Do we need to create the equivalent of the Service Installation scripts for Tika-Server? 

Wanted to stoke the discussion!


Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
<http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be Company Confidential
unless explicitly stated otherwise, regardless of whether attachments are marked as such.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message