tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ken Krugler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1723) Integrate language-detector into Tika
Date Wed, 03 Feb 2016 16:56:39 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130676#comment-15130676
] 

Ken Krugler commented on TIKA-1723:
-----------------------------------

[~tallison@apache.org] I must admit, focusing on this change in 2.0, and not worrying about
the backwards compatibility stuff (if that's OK) would be nice. Or would we still want to
keep around the old language detector API? I'm hoping the answer is no :)

> Integrate language-detector into Tika
> -------------------------------------
>
>                 Key: TIKA-1723
>                 URL: https://issues.apache.org/jira/browse/TIKA-1723
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>    Affects Versions: 1.11
>            Reporter: Ken Krugler
>            Assignee: Ken Krugler
>            Priority: Minor
>         Attachments: TIKA-1723-2.patch, TIKA-1723-3.patch, TIKA-1723.patch, TIKA-1723v2.patch
>
>
> The language-detector project at https://github.com/optimaize/language-detector is faster,
has more languages (70 vs 13) and better accuracy than the built-in language detector.
> This is a stab at integrating it, with some initial findings. There are a number of issues
this raises, especially if [~chrismattmann] moves forward with turning language detection
into a pluggable extension point.
> I'll add comments with results below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message