tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-855) Language Detection not working for Japanese and Chinese.
Date Mon, 20 Feb 2012 10:19:35 GMT

    [ https://issues.apache.org/jira/browse/TIKA-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211760#comment-13211760
] 

Christian Moen commented on TIKA-855:
-------------------------------------

Thanks, James.  I've linked the issues.  Perhaps we can track this in TIKA-856.
                
> Language Detection not working for Japanese and Chinese.
> --------------------------------------------------------
>
>                 Key: TIKA-855
>                 URL: https://issues.apache.org/jira/browse/TIKA-855
>             Project: Tika
>          Issue Type: Bug
>          Components: languageidentifier
>    Affects Versions: 1.0
>         Environment: Windows XP, Vista and Linux Ubuntu 11.10 using Sun Java 6 and Oracle
Java 7
>            Reporter: James Sullivan
>            Assignee: Ken Krugler
>            Priority: Minor
>              Labels: Chinese, Japanese
>
> I have tried Tika 1.0 language detection (java -jar tika.jar -l .\Japanese.txt) on several
Japanese files (both PDF and text files) and it consistently returns lt (Lithuanian???) instead
of ja. I also tried on a Chinese file which similarly incorrectly returned lt. Both English
language and French language detection worked correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message