lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saisantoshi <>
Subject Is StandardAnalyzer good enough for multi languages...
Date Tue, 08 Jan 2013 19:30:25 GMT
DoesLucene StandardAnalyzer work for all the languagues for tokenizing before
indexing (since we are using java, I think the content is converted to UTF-8
before tokenizing/indeing)? or do we need to use special analyzers for each
of the language.  In this case, if a document has a mixed case ( english +
Japanese), what analyzer should we use and how can we figure it out
dynamically before indexing?

Also, while searching if the query text contains (both english and
Japanese), how does this work? Any criteria in choosing the analyzers?


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message