tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ken Krugler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-335) TXTParser should use incoming charset
Date Tue, 01 Dec 2009 03:03:22 GMT

    [ https://issues.apache.org/jira/browse/TIKA-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784026#action_12784026
] 

Ken Krugler commented on TIKA-335:
----------------------------------

It should, yes - it passes in both Eclipse and in the Maven build.

Could be another case of UTF-8 in a string, similar to TIKA-334. Try using this in the testUsingIncomingCharsetAsHint:

        final String test2 = "the name is \u00e1ndre";


> TXTParser should use incoming charset
> -------------------------------------
>
>                 Key: TIKA-335
>                 URL: https://issues.apache.org/jira/browse/TIKA-335
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Ken Krugler
>            Priority: Minor
>         Attachments: TIKA-335.patch
>
>
> The incoming charset (if any) from metadata should be passed to CharsetDetector.setDeclaredEncoding().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message