tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Conn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1867) Tika external parsers cannot be turned off without patching the tika-app-XX.jar
Date Wed, 10 May 2017 17:00:09 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005013#comment-16005013

Daniel Conn commented on TIKA-1867:

Hi [~gagravarr],

After trying the link earlier I have tried to exclude this parser but it still seems to be
calling it and in turn trying to call ffmpeg and exiftools due to the tika-external-parsers.xml
file. I too am looking for a solution to this, perhaps one check if these things exist on
startup and then cache this, instead of many checks for the same program? Or a TikaConfig
constructor which allows you to explicitly remove constructors? These are just ideas though!

Just in case I've got the wrong end of the stick here is what I put in the config file. Could
you kindly confirm this was correct, or where I'm going wrong:

<?xml version="1.0" encoding="UTF-8"?>
    <parser class="org.apache.tika.parser.DefaultParser">
      <parser-exclude class="org.apache.tika.parser.external.CompositeExternalParser"/>

Thanks and Kind Regards


> Tika external parsers cannot be turned off without patching the tika-app-XX.jar
> -------------------------------------------------------------------------------
>                 Key: TIKA-1867
>                 URL: https://issues.apache.org/jira/browse/TIKA-1867
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11
>            Reporter: Roman Kratochvil
> The CompositeExternalParser calls ExternalParsersFactory.create() which always uses configuration
from org/apache/tika/parser/external/tika-external-parsers.xml. The issue is that this introduces
performance regression as the parser initialization checks for presence of external commands
(ffmpeg, exiftool) and that takes time.
> Unfortunately, there is no way how to turn off this functionality without patching the
tika-app JAR -- one has to either change the tika-external-parsers.xml or remove the whole
CompositeExternalParser from list of services in /META-INF/services/org.apache.tika.parser.Parser.

This message was sent by Atlassian JIRA

View raw message