tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksandr Dubinsky (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-1309) RTF TextExtractor ignores consecutive linebreaks
Date Sat, 24 May 2014 14:12:01 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksandr Dubinsky updated TIKA-1309:
-------------------------------------

    Description: RTF files (such as those produced by WordPad) often encode consecutive linebreaks
as simply consecutive \par commands. However, org.apache.tika.parser.rtf.TextExtractor ignores
the second \par. Solution is very simple. See attached patch.  (was: RTF files (such as those
produced by WordPad) typically encode consecutive linebreaks as simply consecutive \par commands.
However, org.apache.tika.parser.rtf.TextExtractor ignores the second \par. Solution is very
simple. See attached patch.)

> RTF TextExtractor ignores consecutive linebreaks
> ------------------------------------------------
>
>                 Key: TIKA-1309
>                 URL: https://issues.apache.org/jira/browse/TIKA-1309
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5, 1.6
>            Reporter: Aleksandr Dubinsky
>         Attachments: 0001-fix-RTF-ignores-consecutive-newlines.patch, test.rtf
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> RTF files (such as those produced by WordPad) often encode consecutive linebreaks as
simply consecutive \par commands. However, org.apache.tika.parser.rtf.TextExtractor ignores
the second \par. Solution is very simple. See attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message