lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
Date Sat, 27 Apr 2013 13:14:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643659#comment-13643659
] 

Jack Krupansky commented on LUCENE-4956:
----------------------------------------

As a user trying to browse and find analyzers and tokenizers for specific languages, I object.
I mean, I should be able to look at the language code and guess what module it might be in.
It's one thing if the module name is reasonably general and there is a reasonable expectation
that average users would readily associate it with specific langauges, or to categorically
group languages, but just giving an artificial, non-obvious name to the module than would
not be obvious to an average user seems like a poor choice, to me.

Even if you just called the module "korean", at least that would be a helpful guide to people
like me browsing the list of modules. and then the package name can distinguish the implementations
for that language.

Also, it should be possible to mix multiple implementations for the same langauge in the same
application, so, the package name does not to have some unique name, unless there is guaranteed
to be only one implementation for that language.

I would suggest that there should be two choices for language-based analysis modules:

1. Category name, where there is some general approach that covers a number of langauges and
need to share classes.
2. Language code, hyphen, some arbitrary name for implementations that cover only a single
language.

Even for #1, I would suggest that there should be a prefix that covers the "type" of languages
covered (eastern european, asian, etc.)

That said, I would not stand in the way of adding Korean analysis as soon as possible. I mean,
this contribution shouldn't have to correct all of the sins of past contributions.

                
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>              Labels: newbie
>         Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service with lucene
& solr in korean, there are some problems in searching and indexing. The korean analyer
solved the problems with a korean morphological anlyzer. It consists of a korean morphological
analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made
for lucene and solr. If you develop a search service with lucene in korean, It is the best
idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message