lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Desilets, Alain" <>
Subject RE: [lucy-user] Can lucy do substring search?
Date Thu, 02 Feb 2012 14:41:27 GMT
Even if I did that, I would still need to search the domain as a non-exact value. For example,
I might want to search on * to search all Government of Canada web sites, or search
on * to search only on the site of the National Research Council of Canada,
or limit the search to any web site in Canada, like *.ca.


-----Original Message-----
From: Peter Karman [] 
Sent: Thursday, February 02, 2012 9:27 AM
To: ''
Subject: Re: [lucy-user] Can lucy do substring search?

On 2/2/12 7:40 AM, Desilets, Alain wrote:
> Thx Peter. In my case, the fields on which I need to do wild-card searches are fields
that specify the URL of a document. I want to be able to use this to limit the search to documents
which are on specific web sites.
> It seems the best balance in that case, between accuracy and speed, would be to tokenize
on non word character. Then, I could retrieve a superset of docs on say,,
by searching for "" (with a QueryParser). This might accidentally retrieve
docs whose urls contain www/somewhwere/org (for example), but I would do a second pass to
filter the docs whose url do not match the actual expression I would need
to do this second pass anyway, even if I was using a WildCard search, because, I might accidentally
match a URL that has in a different part than the IP name (ex: http:/

why not pull the hostname out at indexing time into its own field? then 
your particular use case should get no false positives?

Peter Karman  .  .

View raw message