tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-402) Support for iWork documents
Date Mon, 07 Jun 2010 21:35:44 GMT

    [ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876431#action_12876431

Jukka Zitting commented on TIKA-402:

> XML root element detection

See the o.a.t.detect.XmlRootExtractor class and the <root-XML/> entries in the tika-mimetypes.xml
configuration file.

> directory

My idea is that if you point a file system crawler to uncompressed iWork directories, we should
still be able to produce reasonable output when the crawler feeds the XML file to Tika.

> Support for iWork documents
> ---------------------------
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch,
testKeynote.key, testKeynote.key, testNumbers.numbers, testPages.pages
> It would be nice to have support for documents created by Apple's Keynote and Pages applications.
Both file formats are described in http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
I'm not sure if there already are open source parser libraries for these formats or if we'd
need to directly process the XML content.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message