tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-642) Few of RTF files not extracting properly
Date Thu, 19 May 2011 12:12:47 GMT

    [ https://issues.apache.org/jira/browse/TIKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036139#comment-13036139
] 

Jukka Zitting commented on TIKA-642:
------------------------------------

No good alternatives until someone finds (or creates) a liberally licensed RTF parser library
that we could use instead of relying on javax.swing. Until that the best we can do is to try
add extra tweaks and hacks like the ones we already have in the RTFParser class.

As a minor improvement on this, in revision 1124702 I added code that converts such parse
errors from IO- to TikaExceptions so we get a better error message than the generic "Illegal
IOException".

> Few of RTF files not extracting properly
> ----------------------------------------
>
>                 Key: TIKA-642
>                 URL: https://issues.apache.org/jira/browse/TIKA-642
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9, 1.0
>         Environment: All
>            Reporter: Manish
>         Attachments: FIRM GAS GTC B RED.DOC
>
>
> Few of the RTF files dont get extracted properly. 
> This is the stack trace: 
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.rtf.RTFParser@616d071a
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:203)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> Caused by: java.io.IOException: Too many close-groups in RTF text
> at javax.swing.text.rtf.RTFParser.write(RTFParser.java:156)
> at javax.swing.text.rtf.RTFParser.writeSpecial(RTFParser.java:101)
> at javax.swing.text.rtf.AbstractFilter.write(AbstractFilter.java:158)
> at javax.swing.text.rtf.AbstractFilter.readFromStream(AbstractFilter.java:88)
> at javax.swing.text.rtf.RTFEditorKit.read(RTFEditorKit.java:65)
> at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message