tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1062) Add list detection to RTFParser
Date Fri, 25 Jan 2013 06:13:12 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562457#comment-13562457

Jukka Zitting commented on TIKA-1062:

bq. coding style

We've generally tried to stick with the [standard Java conventions|http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html]
(with spaces instead of tabs), but haven't been too fundamental about that. If you write new
code, you get to decide how it looks like (within reason :-). If you modify existing code,
try to stick with the existing style.
> Add list detection to RTFParser
> -------------------------------
>                 Key: TIKA-1062
>                 URL: https://issues.apache.org/jira/browse/TIKA-1062
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Axel Dörfler
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: patch
>             Fix For: 1.4
>         Attachments: testRTFListLibreOffice.rtf, testRTFListMicrosoftWord.rtf, tika-rtf-lists.patch
> RTF supports lists, and the parser could support those, too, using HTML <ul>/<ol>/<li>
> I'm attaching a patch that implements basic support for Word 97 and newer lists. Nested
lists are not supported correctly, yet, though, and a number of formatting options are ignored.
> I've also added test cases for this, and adapted existing tests where needed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message