lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Whelan <>
Subject Re: Searching doubt
Date Tue, 04 Aug 2009 15:19:06 GMT
On Tue, Aug 4, 2009 at 3:56 AM, Shai Erera<> wrote:
> 2) Use a dictionary (real dictionary), and search it for every substring,
> e.g. "a", "ab", "abo" ... "about" etc. If you find a match, split it there.
> This needs some fine tuning, like checking if the rest is also a word and if
> the full string is also a word, so that you don't break up meaningful words.
> You'll need to get a dictionary for that.

I do not have a solution to this, but it strikes me as very similar to
they way you traverse Japanese to break words, since that has no
spaces. Is there a Japanese tokenizer and, if so, does it handle this?
If so, you could replace the Japanese dictionary with an English
dictionary. Just a random thought had that might / might not help.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message