tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ken Krugler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-465) LanguageIdentifier API enhancements
Date Sun, 01 Mar 2015 22:59:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342525#comment-14342525

Ken Krugler commented on TIKA-465:

I'm actually working on a new language detector, so I think this can be closed.

> LanguageIdentifier API enhancements
> -----------------------------------
>                 Key: TIKA-465
>                 URL: https://issues.apache.org/jira/browse/TIKA-465
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>            Reporter: Chris A. Mattmann
>            Assignee: Ken Krugler
>            Priority: Minor
> As originally reported by Jerome Charron in NUTCH-86, Jerome identified a set of improvements
for the LanguageIdentifier that we should consider in Tika:
> {quote}
> More informations can be found on the following thread on Nutch-Dev mailing list:
> http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg00569.html
> Summary:
> 1. LanguageIdentifier API changes. The similarity methods should return an ordered array
of language-code/score pairs instead of a simple String containing the language-code.
> 2. Ensure consistency between LanguageIdentifier scoring and NGramProfile.getSimilarity().
> {quote}
> I just wanted to capture the issue here in Tika, since I'm about to close it out in Nutch
since LanguageIdentification is something that can happen in Tika-ville...

This message was sent by Atlassian JIRA

View raw message