lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [lucy-user] Can lucy do substring search?
Date Wed, 01 Feb 2012 00:24:21 GMT
On Tue, Jan 31, 2012 at 01:19:38PM -0500, Desilets, Alain wrote:
> I was wondering if there was a way to tokenize the string into individual
> characters instead, and whether that is advisable from a performance point
> of view.

You can experiment with changing the 'pattern' argument to RegexTokenizer#new
to be '.' or '\\S'.  It will definitely be worse from a performance
standpoint, as matching a URL will now require a PhraseQuery with one term for
each letter rather than one term for each component matching \w+ in the URL,
and these terms will exist in virtually every document.

Marvin Humphrey

View raw message