lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <>
Subject Re: [lucy-user] Can lucy do substring search?
Date Thu, 02 Feb 2012 02:22:31 GMT
Desilets, Alain wrote on 2/1/12 10:15 AM:
> Thx Peter. Would this encur the same performance problem as tokenizing the string on
a character by character basis?

WildcardQuery is slower than a TermQuery. It's all at search time though,
whereas tokenizing the string on a character basis happens at index time and
search time.

Your use case will incur a performance hit no matter what. In my apps, I
tokenize substrings for only particular fields at index time, and do some term
expansion instead of wildcards using a custom lexicon at search time. IME, it's
about finding a balance in your architecture to best fit your actual use cases.
Accuracy vs speed, is one balance to find. The use case you described (finding
all docs with a field matching a particular hostname) could be accomplished with
no change in indexing or tokenizing, if you used the WildcardQuery; whether that
proves too slow depends on your requirements. Try it and see.

Peter Karman  .  .

View raw message