tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2231) Invalid language code exception
Date Wed, 18 Jan 2017 02:33:26 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827335#comment-15827335
] 

ASF GitHub Bot commented on TIKA-2231:
--------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/tika/pull/147


> Invalid language code exception
> -------------------------------
>
>                 Key: TIKA-2231
>                 URL: https://issues.apache.org/jira/browse/TIKA-2231
>             Project: Tika
>          Issue Type: Bug
>          Components: ocr
>    Affects Versions: 1.14
>            Reporter: Peter Weiss
>            Priority: Minor
>              Labels: beginner, easyfix, easytest, newbie
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> There is a regex in TesseractOCRConfig.setLanguage(String language) which attempts to
validate the language being set.  Unfortunately it does not allow you to set some languages
that are valid for tesseract.
> For example:
> TesseractOCRConfig config = new TesseractOCRConfig();
> config.setLanguage("chi_tra");
> This throws an IllegalArgumentException because of the '_' in the language name.  "chi_tra"
is a valid tesseract language code.
> Need to update the regex to allow '_' character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message