tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-402) Support for iWork documents
Date Wed, 07 Jul 2010 07:21:54 GMT

    [ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885862#action_12885862
] 

Jukka Zitting commented on TIKA-402:
------------------------------------

BTW, the XHTMLContentHandler is an extended ContentHandler, so you can also use lower level
methods like characters(char[],int,int). I did that in revision 961266 to avoid having to
instantiate extra String objects in the iwork content handlers.

> Support for iWork documents
> ---------------------------
>
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8
>
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch,
iwork.patch, iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and Pages applications.
Both file formats are described in http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
I'm not sure if there already are open source parser libraries for these formats or if we'd
need to directly process the XML content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message