tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Axel Dörfler (JIRA) <j...@apache.org>
Subject [jira] [Updated] (TIKA-1062) Add list detection to RTFParser
Date Wed, 23 Jan 2013 15:48:16 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Axel Dörfler updated TIKA-1062:

    Attachment: tika-rtf-lists.patch

The RTF files are supposed to go into "tika-parsers/src/test/resources/test-documents".
> Add list detection to RTFParser
> -------------------------------
>                 Key: TIKA-1062
>                 URL: https://issues.apache.org/jira/browse/TIKA-1062
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Axel Dörfler
>            Priority: Minor
>              Labels: patch
>         Attachments: testRTFListLibreOffice.rtf, testRTFListMicrosoftWord.rtf, tika-rtf-lists.patch
> RTF supports lists, and the parser could support those, too, using HTML <ul>/<ol>/<li>
> I'm attaching a patch that implements basic support for Word 97 and newer lists. Nested
lists are not supported correctly, yet, though, and a number of formatting options are ignored.
> I've also added test cases for this, and adapted existing tests where needed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message