tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Meschberger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-322) Improve encoding detection speed and accuracy
Date Fri, 13 Aug 2010 05:03:16 GMT

    [ https://issues.apache.org/jira/browse/TIKA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898081#action_12898081

Felix Meschberger commented on TIKA-322:

According to [1] MPL is a Category B license and such licensed work can be included in binary-only

[1] http://www.apache.org/legal/resolved.html#category-b

> Improve encoding detection speed and accuracy
> ---------------------------------------------
>                 Key: TIKA-322
>                 URL: https://issues.apache.org/jira/browse/TIKA-322
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Priority: Minor
> The encoding detection code we took from ICU4J is not very efficient and sometimes produces
odd results when more than one encoding matches the given input data. It would be good to
refactor the code to be faster for easy-to-detect encodings and to have better heuristics
in case multiple matches are found.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message