tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: [DISCUSS] Enable specific ContentHandler for tika-server
Date Fri, 29 Sep 2017 18:05:35 GMT
On Fri, 29 Sep 2017, Giuseppe Totaro wrote:
> To sum up, I would like to quickly discuss the following aspects:
>
>   - As you all mentioned, the HTTP headers for configuring the
>   ContentHandler to be used are better suited for the dynamic cases.
>   Specifically, a ContentHadler can be given through an ad-hoc header, e.g.
>   -H "X-Content-Handler: StandardsExtractingContentHandler", parsed and used
>   run-time within tika-server.
>   - Nick, I believe that providing the ability to determine the
>   ContentHandler through a command-line option is a great idea. It could be
>   better also for users.

To make for shorter headers / options, I'd suggest that you test the value 
given for a ".". If it has one, treat as a class name. If it doesn't, try 
to prefix with org.apache.tika.sax , so that just short class names can be 
used for Tika built-in handlers

Nick

Mime
View raw message