tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1062) Add list detection to RTFParser
Date Thu, 24 Jan 2013 22:27:12 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562060#comment-13562060

Michael McCandless commented on TIKA-1062:

Hi Axel,

I don't actually know that Tika has adopted official code style (anyone?).  Really I was just
carrying forward Lucene's code style (put {} around even single-line code blocks to avoid
future bug risk...).  You succeeded very well, and, yes, the current code style "varies" :)
> Add list detection to RTFParser
> -------------------------------
>                 Key: TIKA-1062
>                 URL: https://issues.apache.org/jira/browse/TIKA-1062
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Axel Dörfler
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: patch
>             Fix For: 1.4
>         Attachments: testRTFListLibreOffice.rtf, testRTFListMicrosoftWord.rtf, tika-rtf-lists.patch
> RTF supports lists, and the parser could support those, too, using HTML <ul>/<ol>/<li>
> I'm attaching a patch that implements basic support for Word 97 and newer lists. Nested
lists are not supported correctly, yet, though, and a number of formatting options are ignored.
> I've also added test cases for this, and adapted existing tests where needed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message