tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Mastarone (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-867) UTF-8 encoding does not work on windows
Date Fri, 18 May 2012 00:59:02 GMT

     [ https://issues.apache.org/jira/browse/TIKA-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

John Mastarone updated TIKA-867:

    Attachment: TIKA-867.patch

This issue seems to be a duplicate of TIKA-324, but for Windows.  I've submitted a patch that
duplicates the fix applied in it.  After the patch is applied, it is necessary to use the
chcp command in the Windows command prompt. With the fonts Lucida Console or Consolas, I'm
able to see the correct output of "Währung" if I run "chcp 65001" before running Tika; my
default code page of 437 does not produce the correct output, nor does page 850, but 65001
does.  Running chcp without this patch does not seem to work--I tried multiple code pages,
including the three aforementioned, without success.
> UTF-8 encoding does not work on windows
> ---------------------------------------
>                 Key: TIKA-867
>                 URL: https://issues.apache.org/jira/browse/TIKA-867
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.0
>         Environment: Windows 7 Enterprise (Java 1.6.0_31) and MAC OS X 10.7.3 (Java 1.6.0_30)
>            Reporter: Wolfgang Außerlechner
>         Attachments: TIKA-867.patch
> When calling tika as command line tool from within java and parsing the output buffer
with UTF-8 (e.g. new String(buffer, 0, len, Charset.forName("UTF-8"));) behaviour on windows
is different than on mac os.
> On windows the encoding seems to be wrong (Währung vs. W?hrung). Other tools like exiftool
work as expected.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message