tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-655) IWorkPackageParser / IWorkParser not registering properly
Date Fri, 06 May 2011 03:45:03 GMT

    [ https://issues.apache.org/jira/browse/TIKA-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029720#comment-13029720
] 

Nick Burch commented on TIKA-655:
---------------------------------

In r1100039, I've pushed the iWorks detection logic from ZipContainerDetector to IWorkPackageParser,
and made that detect similar to OfficeParser does.

Then, put the content handler selection logic into IWorkPackageParser, and remove IWorkParser
(which claimed to be a regular parser but in fact only worked when called from IWorkPackageParser).
The result is that tika app can then parse iWork files, and unit tests still work


> IWorkPackageParser / IWorkParser not registering properly
> ---------------------------------------------------------
>
>                 Key: TIKA-655
>                 URL: https://issues.apache.org/jira/browse/TIKA-655
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>
> If you try to use AutoDetectParser to handle an iWork document, it'll fail with:
>  org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed
in prolog.
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
> However IWorkPackageParser works fine. It seems the IWorkParser needs just the individual
zip part, but is registered as the handler for the individual mime types, so breaks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message