lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee Li Bin" <>
Subject RE: Lucene for chinese search
Date Mon, 18 Jun 2007 12:45:26 GMT

I still met problem for searching of Chinese words.
XMl file which is the datasource and analyzer has already been encoded.
Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it
still can't get any results.

1.	do we need any encoding configuration in apache tomcat for Chinese
search using Lucence 

2.	do we need to use JSP meta / page encoding ? what is the encoding
for 	jsp?

Lee Li Bin

-----Original Message-----
From: Chris Lu [] 
Sent: Monday, June 18, 2007 2:10 AM
Subject: Re: Lucene for chinese search

There are three things to watch out for chinese or CJK languages:

1. The content source or database need to be encoded in UTF-8.
2. StandardAnalyzer doesn't support chinese words well. Use either
ChineseAnalyzer or CJKAnalyzer. My experience is that CJKAnalyzer is a
little better.
3. The user's query should be encoded in UTF-8.

Chris Lu
Instant Scalable Full-Text Search On Any Database/Application
Lucene Database Search in 3 minutes:

On 6/17/07, <> wrote:
> Hi,
> I would like to know whether Standard Analyzer allows searching of chinese
> words?
> And in order to support chinese searching, is there any encoding needed in
> order to develop the application?
> I'm currently using Jetty as web server, jsp as application, and search
> results will be saved in xml file and display it using xsl. So is there
> encoding needed for any of the files (xml, xsl, etc...) as well as during
> parsing of query?
> thanks alot
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message