tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-123) Structured MS Office parsing
Date Wed, 12 Nov 2008 06:17:44 GMT

    [ https://issues.apache.org/jira/browse/TIKA-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646810#action_12646810

Otis Gospodnetic commented on TIKA-123:

Is this even possible?  Does POI provide such functionality and Tika simply needs to expose

> Structured MS Office parsing
> ----------------------------
>                 Key: TIKA-123
>                 URL: https://issues.apache.org/jira/browse/TIKA-123
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
> The MS Office parsers currently extract and output document content as a single string.
We should support structured text at least down to page and paragraph (not sure how accurate)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message