lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saisantoshi <>
Subject Lucene support for multi byte characters : 2.4.0 (version).
Date Tue, 08 Jan 2013 18:20:00 GMT
We are using Lucene (2.4.0 libraries) for implementing search in our
application. We are using Standard Analyzer for Analyzer part.  

Our application has a documents upload feature which lets you upload the
documents and be able to put in some keywords (while uploading it). When we
search (using the keywords), the search will retrieve the documents based on
the keywords.

The problem that we are facing is the search works fine if the keywords are
in English or Simplified Chinese but is not supporting Japanese. 

I am not sure if its the problem with the Analyzer that we are using or is
not being supported in 2.4.0 version (Japanese Characters). I did find the
following below doing  a Google search. ( support all of the

We are not tokenizing the document, we are only tokenizing the keywords
added while uploading the document.

document.add(new Field(field.getKeyword(), value, Field.Store.NO,

Do you think upgrading to the latest version of the Lucene would solve the
issue? or do we need to use special analyzers for each specific language?
Does the Standard Analyzer does not support Unicode characters?

Any thoughts on this is much appreciated?


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message