tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (TIKA-402) Support for iWork documents
Date Tue, 06 Jul 2010 13:19:49 GMT

     [ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jukka Zitting reopened TIKA-402:
--------------------------------


Reopening for a minor test failure on Java 5, see revision 960892. It looks like in some cases
the parser loses whitespace between words. This is probably related to the way the XML parser
works in the underlying Java version. Perhaps a distinction between characters() and ignorableWhitespace()
calls.

> Support for iWork documents
> ---------------------------
>
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8
>
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch,
iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and Pages applications.
Both file formats are described in http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
I'm not sure if there already are open source parser libraries for these formats or if we'd
need to directly process the XML content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message