lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: Re[2]: startsWith?
Date Tue, 06 May 2008 02:10:13 GMT

On 3-May-08, at 10:44 PM, JLIST wrote:

> Hello Otis,
>
> Do you mean that if I index the URL as a "text" field, I'll
> be able to do * for a given prefix because the text will be
> tokenized at the "/" and should suffice for my need?

I'm not sure what your needs are, but I use the following to index urls:

     <fieldType name="reverse_domain" class="solr.TextField">
       <analyzer>
         <tokenizer class="solr.PatternTokenizerFactory" pattern="\."/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
     </fieldType>

(in which is stored the _reversed domain_.  That is, "com.example.www")

I also store the url as a textTight (see example schema).  If you want  
to do prefix matching on the url,  I recommend storing it untokenized  
in another field (or minimal tokenization, like lowercasing).

If, like me, you want to restrict document to a certain domain and  
subdomains, you have to be careful with your query:

reverse_domain:com.example reverse_domain:com.example.*

If you just do reverse_domain:com.example*, you will also match www.foo-example.com 
, which you don't want.

-Mike

Mime
View raw message