lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Rhoten (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-5224) org.apache.lucene.analysis.hunspell.HunspellDictionary should implement ICONV and OCONV lines in the affix file
Date Tue, 17 Sep 2013 22:23:51 GMT
George Rhoten created LUCENE-5224:
-------------------------------------

             Summary: org.apache.lucene.analysis.hunspell.HunspellDictionary should implement
ICONV and OCONV lines in the affix file
                 Key: LUCENE-5224
                 URL: https://issues.apache.org/jira/browse/LUCENE-5224
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 4.4, 4.0
            Reporter: George Rhoten


There are some Hunspell dictionaries that need to emulate Unicode normalization and collation
in order to get the correct stem of a word. The original Hunspell provides a way to do this
with the ICONV and OCONV lines in the affix file. The Lucene HunspellDictionary ignores these
lines right now.

Please support these keys in the affix file.

This bit of functionality is briefly described in the hunspell man page http://manpages.ubuntu.com/manpages/lucid/man4/hunspell.4.html

This functionality is practically required in order to use a Korean dictionary because you
want only some of the Jamos of a Hangul character (grapheme cluster) when using stemming.
Other languages will find this to be helpful functionality.

Here is an example for a .aff file:

{code}
ICONV 각 각
...
OCONV 각 각
{code}

Here is the same example escaped.

{code}
ICONV \uAC01 \u1100\u1161\u11A8
...
OCONV \u1100\u1161\u11A8 \uAC01
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message