lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Multiple Language Indexing and Searching
Date Tue, 06 Sep 2005 12:51:02 GMT

On Sep 6, 2005, at 7:15 AM, Hacking Bear wrote:

> On 9/6/05, Olivier Jaquemet <> wrote:
>> As far as your usage is concerned, it seems to be the right approach,
>> and I think the StandardAnalyzer does the job pretty right when it  
>> has
>> to deal with whatever language you want.
>  I should look into exactly what it does. Does this  
> StandardAnalyzer handle
> non-European languages like Chinese?

StandardAnalyzer recognizes the CJK range of characters and emits  
them each as individual tokens.  This is not an ideal way to deal  
with Chinese (不好 :) - but it does at least maintain the characters  
rather than throwing them away or blurring consecutive characters  


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message