lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <>
Subject Re: [lucy-user] Can lucy do substring search?
Date Thu, 02 Feb 2012 14:26:57 GMT
On 2/2/12 7:40 AM, Desilets, Alain wrote:
> Thx Peter. In my case, the fields on which I need to do wild-card searches are fields
that specify the URL of a document. I want to be able to use this to limit the search to documents
which are on specific web sites.
> It seems the best balance in that case, between accuracy and speed, would be to tokenize
on non word character. Then, I could retrieve a superset of docs on say,,
by searching for "" (with a QueryParser). This might accidentally retrieve
docs whose urls contain www/somewhwere/org (for example), but I would do a second pass to
filter the docs whose url do not match the actual expression I would need
to do this second pass anyway, even if I was using a WildCard search, because, I might accidentally
match a URL that has in a different part than the IP name (ex: http:/

why not pull the hostname out at indexing time into its own field? then 
your particular use case should get no false positives?

Peter Karman  .  .

View raw message