lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saisantoshi <>
Subject Re: Is StandardAnalyzer good enough for multi languages...
Date Wed, 09 Jan 2013 18:23:41 GMT
Thanks for all the responses. From the above, it sounds that there are two

1. Use ICUTokenizer ( is it in Lucene 4.0 or 4.1)? If its in 4.1, then we
cannot use at this time as it is not released out.

2. Write a custom analyzer by extending ( StandardAnalyzer) and add filters
for additional languages. 

The problem that we are facing currently is described in detail at:
Just to summarize it, we are facing some issues tokenizing some Japanese
keyword characters (while uploading some documents, we have some keywords
where people can type in any language) and as a result, searching using such
specific keywords words is not working with the StandardAnalyzer (2.4.0

Can you suggest any filter for this to integrate in Standard Analyzer?


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message