lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: startsWith?
Date Sun, 04 May 2008 02:04:07 GMT
Hi,
Special URL tokenization if often a good thing to do with URLs.  For example, over on Simpy.com
you can do searches like this:

http://www.simpy.com/links/site/techcrunch.com
http://www.simpy.com/links/site/www.techcrunch.com

If you were to examine results, you'd see that the two result sets mostly overlap.  That is
because a "site" field contains both tokens.
Often people reverse the domain or hostname, so then can do searches such as com* and they
do more sophisticated URL tokenizing.

Anyhow, for your needs you could also try something simple as http://domain/articles/2008/*
to find docs with that URL prefix.

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: JLIST <jlist9@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Saturday, May 3, 2008 9:59:47 PM
> Subject: startsWith?
> 
> Hi, I wonder it's possible search for text/string fields that starts
> with a substring, similar to Java's startsWith function? For example,
> if I have a URL indexed as text or string field, can I find URLs that
> starts with "http://domain/articles/2008/" ?
> 
> If not, what's the best way to implement a query like this? By
> splitting up the URL into sections and index them incrementally
> like below?
> 
> http://domain/
> http://domain/articles/
> http://domain/articles/2008/
> ...
> 
> 



Mime
View raw message