tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-552) Further improvements to Word .doc and .docx parsing
Date Thu, 10 Feb 2011 13:55:57 GMT

    [ https://issues.apache.org/jira/browse/TIKA-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993055#comment-12993055
] 

Nick Burch commented on TIKA-552:
---------------------------------

Need to get another POI release out, then bump the dependency to be able to fix the next few
bits. I'd suggest we leave it open for now, until we do that.

> Further improvements to Word .doc and .docx parsing
> ---------------------------------------------------
>
>                 Key: TIKA-552
>                 URL: https://issues.apache.org/jira/browse/TIKA-552
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.8
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 0.9
>
>
> This is a follow-on to TIKA-506, to track the enhancements to .doc and .docx parsing
between 0.8 and 0.9
> The list includes:
> * Anchors and bookmarks
> * Floating word .doc pictures (\u0008 rather than \u0001)
> * Nested word .doc tables

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message