tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (TIKA-392) RTF parser smashes words together in subsequent table cells
Date Wed, 24 Mar 2010 13:10:27 GMT

     [ https://issues.apache.org/jira/browse/TIKA-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jukka Zitting resolved TIKA-392.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.7
         Assignee: Jukka Zitting

Fixed in revision 927044 by explicitly adding extra whitespace between subsequent text runs.

> RTF parser smashes words together in subsequent table cells
> -----------------------------------------------------------
>
>                 Key: TIKA-392
>                 URL: https://issues.apache.org/jira/browse/TIKA-392
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.7
>
>
> I have an RTF document with the following snippet of content (it's an export of a private
phone book so I can't share the full document):
> {\rtlch\fcs1 \af0\afs24 \ltrch\fcs0 \f0\fs24\lang2055\langfe2055\langfenp2055\insrsid9461491\charrsid9461491
Fax / Phone Station\cell Fax / Phone #\cell }
> The extracted text is:
> Fax / Phone StationFax / Phone
> Note how the cell boundary between "Station" and "Fax" is lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message