tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1867) Tika external parsers cannot be turned off without patching the tika-app-XX.jar
Date Wed, 10 May 2017 21:16:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005441#comment-16005441

Nick Burch commented on TIKA-1867:

I've just tried with your config file and the Tika App. I'm seeing it correctly exclude the
External Parser as I'd expect:

$ tika --list-parsers | grep External
         org.apache.tika.parser.external.CompositeExternalParser (Composite Parser):
$ tika --config=/tmp/tika-config-1867.xml --list-parsers | grep Ext

Make sure you're correctly initialising your {{TikaConfig}} object from your config file,
and use the approaches documented in https://wiki.apache.org/tika/Troubleshooting%20Tika#Identifying_what_Parsers_your_Tika_install_supports
to check what you have / haven't got

> Tika external parsers cannot be turned off without patching the tika-app-XX.jar
> -------------------------------------------------------------------------------
>                 Key: TIKA-1867
>                 URL: https://issues.apache.org/jira/browse/TIKA-1867
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11
>            Reporter: Roman Kratochvil
> The CompositeExternalParser calls ExternalParsersFactory.create() which always uses configuration
from org/apache/tika/parser/external/tika-external-parsers.xml. The issue is that this introduces
performance regression as the parser initialization checks for presence of external commands
(ffmpeg, exiftool) and that takes time.
> Unfortunately, there is no way how to turn off this functionality without patching the
tika-app JAR -- one has to either change the tika-external-parsers.xml or remove the whole
CompositeExternalParser from list of services in /META-INF/services/org.apache.tika.parser.Parser.

This message was sent by Atlassian JIRA

View raw message