tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-402) Support for iWork documents
Date Wed, 07 Jul 2010 07:21:54 GMT

    [ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885862#action_12885862

Jukka Zitting commented on TIKA-402:

BTW, the XHTMLContentHandler is an extended ContentHandler, so you can also use lower level
methods like characters(char[],int,int). I did that in revision 961266 to avoid having to
instantiate extra String objects in the iwork content handlers.

> Support for iWork documents
> ---------------------------
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch,
iwork.patch, iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, testPages.pages
> It would be nice to have support for documents created by Apple's Keynote and Pages applications.
Both file formats are described in http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
I'm not sure if there already are open source parser libraries for these formats or if we'd
need to directly process the XML content.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message