lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
Date Mon, 02 Feb 2015 12:52:35 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Christian Moen updated LUCENE-3922:
-----------------------------------
    Attachment: LUCENE-3922.patch

Updated patch with decimal number support, additional javadoc and the test code now makes
precommit happy.

Token-attributes such as part-of-speech, readings, etc. for the normalized token is currently
inherited from the last token used when composing the normalized number. Since these values
are likely to be wrong, I'm inclined to set this attributes to null or a reasonable default.

I'm very happy to hear your thoughts on this.



> Add Japanese Kanji number normalization to Kuromoji
> ---------------------------------------------------
>
>                 Key: LUCENE-3922
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3922
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.0-ALPHA
>            Reporter: Kazuaki Hiraga
>            Assignee: Christian Moen
>              Labels: features
>             Fix For: 5.1
>
>         Attachments: LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch,
LUCENE-3922.patch, LUCENE-3922.patch, LUCENE-3922.patch
>
>
> Japanese people use Kanji numerals instead of Arabic numerals for writing price, address
and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December).
 So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we
need to have a capability to normalize to Kanji numerals).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message