lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Knappmeier <>
Subject Chinese sorting
Date Wed, 17 Dec 2014 11:54:05 GMT

is there any implementation for a chinese collator in Lucene. I've seen 
that there is a chinese analyzer which uses Hidden Markov Models. But 
sorting seems to be an issue on its own and all my googling hasn't led 
to any results yet.

I understand that this is not a trivial issue and I've read that the 
chinese tend to prefer other ordering than by name, since sorting orders 
are so complicated that nobody wants to use them. But we will have to 
sort search results by name, even though the name is chinese (simplified 
chinese at the moment, but traditional may also appear later) and 
currenty chinese words seem to be ordered by their unicode-number, which 
seems not to be the right order.

Thanks in advance for any suggestion,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message