tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ken Krugler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-492) Add language identification support for North Sami, Lule Sami and South Sami
Date Thu, 03 Sep 2015 17:26:46 GMT

    [ https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729427#comment-14729427
] 

Ken Krugler commented on TIKA-492:
----------------------------------

Currently the language-detector library I'm integrating (see TIKA-1723) doesn't support any
of the three Sami languages. I'd open an issue at that project (see https://github.com/optimaize/language-detector/).
So closing this issue, unless somebody wants to (a) port the current built-in Tika detector
to the new architecture, and (b) follow up with Jan about getting training text, and (c) add
the new profiles. I'll wait a few days.

> Add language identification support for North Sami, Lule Sami and South Sami
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-492
>                 URL: https://issues.apache.org/jira/browse/TIKA-492
>             Project: Tika
>          Issue Type: New Feature
>          Components: languageidentifier
>    Affects Versions: 0.7
>            Reporter: Jan H√łydahl
>            Assignee: Ken Krugler
>            Priority: Minor
>
> We need added support for Sami languages.
> According to document "Requirements for support for Sami languages in data processing"
(http://www.samit.no/01-850-51.pdf) Tika will get "Basic Level" support by detecting North
Sami, Lule Sami and South Sami.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message