tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-856) Support CJK (Chinese, Japanese and Korean) language detection
Date Sun, 19 Feb 2012 17:46:36 GMT

    [ https://issues.apache.org/jira/browse/TIKA-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211440#comment-13211440
] 

Christian Moen commented on TIKA-856:
-------------------------------------

Thanks, Jan R.  The {{language-detection}} library is similar to that of Tika's and the command
line mentioned in your link and that Jan H. mentions above basically do the same thing.

Jan H., I'll see if I can put together some language profiles for CJK for Tika later this
week.

                
> Support CJK (Chinese, Japanese and Korean) language detection
> -------------------------------------------------------------
>
>                 Key: TIKA-856
>                 URL: https://issues.apache.org/jira/browse/TIKA-856
>             Project: Tika
>          Issue Type: New Feature
>          Components: languageidentifier
>    Affects Versions: 1.0
>         Environment: All
>            Reporter: James Sullivan
>              Labels: Chinese, Japanese
>
> Support language detection of CJK (Chinese, Japanese and Korean).
> Some estimates have Chinese users overtaking English users on the Internet  so it is
important that these languages used by large number of people be supported.
> See TIKA-855

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message