tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-855) Language Detection not working for Japanese and Chinese.
Date Mon, 20 Feb 2012 10:19:35 GMT

    [ https://issues.apache.org/jira/browse/TIKA-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211760#comment-13211760

Christian Moen commented on TIKA-855:

Thanks, James.  I've linked the issues.  Perhaps we can track this in TIKA-856.
> Language Detection not working for Japanese and Chinese.
> --------------------------------------------------------
>                 Key: TIKA-855
>                 URL: https://issues.apache.org/jira/browse/TIKA-855
>             Project: Tika
>          Issue Type: Bug
>          Components: languageidentifier
>    Affects Versions: 1.0
>         Environment: Windows XP, Vista and Linux Ubuntu 11.10 using Sun Java 6 and Oracle
Java 7
>            Reporter: James Sullivan
>            Assignee: Ken Krugler
>            Priority: Minor
>              Labels: Chinese, Japanese
> I have tried Tika 1.0 language detection (java -jar tika.jar -l .\Japanese.txt) on several
Japanese files (both PDF and text files) and it consistently returns lt (Lithuanian???) instead
of ja. I also tried on a Chinese file which similarly incorrectly returned lt. Both English
language and French language detection worked correctly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message