tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-645) Parsers can't get at an underlying TikaInputStream to get the file if they wanted one
Date Thu, 19 May 2011 14:01:48 GMT

     [ https://issues.apache.org/jira/browse/TIKA-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jukka Zitting resolved TIKA-645.

    Resolution: Fixed

The problem was that when the parser was using TikaInputStream.getFile(), no bytes were recorded
as being read from the stream and the SecureContentHandler couldn't figure out where the all
the output is coming from.

In revision 1124788 I changed the logic a bit so that when the stream is based on a file,
the SecureContentHandler class looks at the total size of the input file instead of the number
of bytes read from the input stream.

> Parsers can't get at an underlying TikaInputStream to get the file if they wanted one
> -------------------------------------------------------------------------------------
>                 Key: TIKA-645
>                 URL: https://issues.apache.org/jira/browse/TIKA-645
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Jukka Zitting
>             Fix For: 1.0
> Spotted this with the office parser, but it should be general. The user creates a TikaInputStream,
and passes that off to the parser framework. The Parser that is called may wish to spot that
the input is a File backed TikaInputStream, and take a shortcut to use the file instead of
the InputStream.
> However, what the parser gets is a TaggedInputStream wrapping a CountingInputStream wrapping
the original TikaInputStream. As such, it can't get at the file.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message