tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-826) TikaException / OfficeXmlFileException with .xlsb files
Date Tue, 03 Jan 2012 05:14:21 GMT

    [ https://issues.apache.org/jira/browse/TIKA-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178632#comment-13178632

Nick Burch commented on TIKA-826:

Should be fixed in r1226651 - Neither parser now claims the format, and if it gets to the
OOXML one on the basis of the parent type, it's declined. Tests also added for these cases.
> TikaException / OfficeXmlFileException with .xlsb files
> -------------------------------------------------------
>                 Key: TIKA-826
>                 URL: https://issues.apache.org/jira/browse/TIKA-826
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>         Environment: Windows 7
>            Reporter: John Mastarone
>             Fix For: 1.1
>         Attachments: TIKA-826.patch
> The file testEXCEL.xlsb in the tika-parsers test-documents folder causes a POI OfficeXmlFileException
when one tries to open it with TikaGUI or TikaCLI, using a latest build.  The reason: Tika
has it configured to be opened with the OfficeParser class, rather than the OOXMLParser class;
it is an Office 2007 file, and should be opened with the OOXMLParser class.  Neither the ExcelParserTest
class nor the OOXMLParserTest class has anything related to .xlsb files.  Once changes are
made to these two parsers so that the OOXMLParser is used (I'll submit a patch shortly for
these), the OfficeXmlFileException goes away, and a new POI exception (IllegalArgumentException
in the ExtractorFactory class) arises in its place, somewhat related to unsolved POI bug 51921;
the creator of this bug mentions a .xlsb file among others.  This exception appears to occur
because POI doesn't seem to be able to handle .xlsb files whatsoever.  A cursory search of
the source for "xlsb" or its mime type yields nothing relevant, and its project has no .xlsb
test files that I can see.   

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message