jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Reschke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6414) Use Tika config to determine non indexed mimeTypes
Date Fri, 05 Jul 2019 09:20:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879098#comment-16879098
] 

Julian Reschke commented on OAK-6414:
-------------------------------------

trunk: (1.7.4) [r1800749|http://svn.apache.org/r1800749] [r1800742|http://svn.apache.org/r1800742]
[r1800726|http://svn.apache.org/r1800726]
1.6: [r1862598|http://svn.apache.org/r1862598]


> Use Tika config to determine non indexed mimeTypes
> --------------------------------------------------
>
>                 Key: OAK-6414
>                 URL: https://issues.apache.org/jira/browse/OAK-6414
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Major
>              Labels: candidate_oak_1_6
>             Fix For: 1.7.4, 1.8.0
>
>
> With OAK-2895 support was added to avoid loading of binary content whose mimeType have
been excluded from indexing via configuring EmptyParser against them. That approach used a
lazyInputStream and relied on the fact that Tika would not access the stream if none of the
parser is going to touch that file.
> However as seen while upgrading to Tika 1.15 now Tika would [check that the InputStream
support marking or not|https://github.com/apache/tika/commit/896c46a0c652de436da0e4f25bfa53a7d83ae02f].

> To support this change we need to change the logic on Oak side to explicit check by reading
tika-config.xml to see which all mimeType have been configured with EmptyParser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message