jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-5048) Upgrade to Tika 1.15 version
Date Mon, 03 Jul 2017 15:10:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072597#comment-16072597
] 

Thomas Mueller commented on OAK-5048:
-------------------------------------

> It triggers opening of the underlying stream which in case of S3DataStore would trigger
fetching of whole file

Do you know, is this LazyInputStream? I see even thought this one is called "Lazy", it's not
lazy when calling markSupported... 

(Just an idea, not sure if it can be done) maybe if we wrap the LazyInputStream in a regular
BufferedInputStream, then this is resolved. Because BufferedInputStream.markSupported always
returns true (without calling the filtered input stream).

> Upgrade to Tika 1.15 version
> ----------------------------
>
>                 Key: OAK-5048
>                 URL: https://issues.apache.org/jira/browse/OAK-5048
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Tommaso Teofili
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>
> Oak Lucene indes is currently using Tika 1.5 version while current latest release of
Apache Tika is 1.14, I think there're lots of "interesting" bugs fixed, and possibly improvements
(performance, more accurate text extraction, etc.) we could get at almost 0 cost by just bumping
the version number.
> Release notes https://tika.apache.org/1.15/index.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message