tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zaheer Beig (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1405) German content detected as French
Date Sat, 30 Aug 2014 10:13:52 GMT
Zaheer Beig created TIKA-1405:

             Summary: German content detected as French
                 Key: TIKA-1405
                 URL: https://issues.apache.org/jira/browse/TIKA-1405
             Project: Tika
          Issue Type: Bug
          Components: languageidentifier
    Affects Versions: 1.4
         Environment: Linux
            Reporter: Zaheer Beig

We are using Apache Tika 1.4  for document conversion to text and language detection in one
of our project. We are facing below issues with language detection:

1. When the text is in all UPPER CASE, even though the language is English, it gets detected
as Estonian.
2. For many of our German content , language gets detected as French [Though this is not the
case for all German content]

Any update on this will be very helpful.

This message was sent by Atlassian JIRA

View raw message