tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: TikaInputStream customization
Date Wed, 06 Jun 2012 10:39:33 GMT
Hi,

On Wed, Jun 6, 2012 at 12:30 PM, K, Baraneetharan
<baraneetharan.k@hp.com> wrote:
> Can anyone pls let me know how to customize TikaInputStream to read only first
> 1000bytes from a given InputStream.

You can use the BoundedInputStream [1] class from Commons IO:

    TikaInputStream.get(new BoundedInputStream(stream, 1000));

However, see the concern in TIKA-307 [2]. Passing a truncated stream
to Tika may produce unexpected results.

[1] http://commons.apache.org/io/api-release/org/apache/commons/io/input/BoundedInputStream.html
[2] https://issues.apache.org/jira/browse/TIKA-307

BR,

Jukka Zitting

Mime
View raw message