tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules
Date Wed, 13 Jan 2016 17:50:39 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096668#comment-15096668

Uwe Schindler commented on TIKA-1824:

Hi, as invited on TIKA-1830, here some comments from Apache Solr:

As already stated in the past, we would like to only bundle parsers for text document formats,
because images, class files or else are not really useful for indexing by default. Users that
want to do this, can still add the missing parser bundles and SPI will do the rest. Currently
we have disabled some parsers by removing the JAR files (like asm-all.jar, netcdf.jar), so
TIKA's SPI will disable them automatically (because of ClassNotFoundEx). This was a bit rude,
but worked.

The reason for this was partly also some version incompatibilities (ASM was old in TIKA, Lucene
needs newest one), but ASM is not really useful for indexing anyways!

In Solr we don't use transitive dependencies in Ivy, so we decide for each JAR file which
one gets bundled, so we check every release anyways during update.

In addition, it would be a good idea to allow loading the TIKA SPI files in a separate classloader
(to isolate the parser classes from others). The reason for this is JAR hell. If TIKA would
load the parsers in its own classloader (optionally, e.g. by configuration), we could place
all parsers and their dependencies in a separate lib directory outside the Solr's lib folder.

> Tika 2.0 -  Create Initial Parser Modules
> -----------------------------------------
>                 Key: TIKA-1824
>                 URL: https://issues.apache.org/jira/browse/TIKA-1824
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 2.0
>            Reporter: Bob Paulin
>            Assignee: Bob Paulin
> Create initial break down of parser modules.

This message was sent by Atlassian JIRA

View raw message