tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Meschberger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-322) Improve encoding detection speed and accuracy
Date Fri, 13 Aug 2010 05:03:16 GMT

    [ https://issues.apache.org/jira/browse/TIKA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898081#action_12898081
] 

Felix Meschberger commented on TIKA-322:
----------------------------------------

According to [1] MPL is a Category B license and such licensed work can be included in binary-only
form.

[1] http://www.apache.org/legal/resolved.html#category-b

> Improve encoding detection speed and accuracy
> ---------------------------------------------
>
>                 Key: TIKA-322
>                 URL: https://issues.apache.org/jira/browse/TIKA-322
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> The encoding detection code we took from ICU4J is not very efficient and sometimes produces
odd results when more than one encoding matches the given input data. It would be good to
refactor the code to be faster for easy-to-detect encodings and to have better heuristics
in case multiple matches are found.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message