tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-465) LanguageIdentifier API enhancements
Date Wed, 14 Jul 2010 17:40:51 GMT
LanguageIdentifier API enhancements

                 Key: TIKA-465
                 URL: https://issues.apache.org/jira/browse/TIKA-465
             Project: Tika
          Issue Type: Improvement
          Components: languageidentifier
            Reporter: Chris A. Mattmann
            Assignee: Chris A. Mattmann
            Priority: Minor

As originally reported by Jerome Charron in NUTCH-86, Jerome identified a set of improvements
for the LanguageIdentifier that we should consider in Tika:

More informations can be found on the following thread on Nutch-Dev mailing list:


1. LanguageIdentifier API changes. The similarity methods should return an ordered array of
language-code/score pairs instead of a simple String containing the language-code.

2. Ensure consistency between LanguageIdentifier scoring and NGramProfile.getSimilarity().

I just wanted to capture the issue here in Tika, since I'm about to close it out in Nutch
since LanguageIdentification is something that can happen in Tika-ville...

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message