lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Norskog, Lance" <la...@divvio.com>
Subject RE: Score of exact matches
Date Tue, 06 Nov 2007 19:34:15 GMT
What is the performance profile of this against merely searching against
one field? My situation is millions of small records with an average of
200 bytes/text field.

Lance 

-----Original Message-----
From: Walter Underwood [mailto:wunderwood@netflix.com] 
Sent: Monday, November 05, 2007 9:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Score of exact matches

This is fairly straightforward and works well with the DisMax handler.
Indes the text into three different fields with three different sets of
analyzers. Use something like this in the request handler:

 <requestHandler name="multimatch" class="solr.DisMaxRequestHandler" >
    <lst name="defaults">
     <float name="tie">0.01</float>
     <str name="qf">
           exact^16 noaccent^4 stemmed
     </str>
     <str name="pf">
           exact^16 noaccent^4 stemmed
     </str>
   </lst>
 </requestHandler>

You will probably need to adjust the weights for your content, though I
expect these are a good starting place.

Per-field analyzers are very easy to use in Solr and are extremely
powerful. I wish we'd thought of that in Ultraseek.

wunder
==
Search Guy, Netflix
Formerly: Architect, Ultraseek

On 11/5/07 9:05 PM, "Papalagi Pakeha" <papalagi.pakeha@gmail.com> wrote:

> Hi all,
> 
> I use Solr 1.2 on a job advertising site. I started from the default 
> setup that runs all documents and queries through 
> EnglishPorterFilterFactory. As a result for example an ad with 
> "accounts" in its title is matched when someone runs a query for 
> "accountant" because both are stemmed to the "account" word and then 
> they match.
> 
> Is it somehow possible to give a higher score to exact matches and 
> sort them before matches from stemmed terms?
> 
> Close to this is a problem with accents - I can remove accents from 
> both documents and from queries and then run the query on non-accented

> terms. But I'd like to give higher score to documents where the search

> term matches exactly (i.e. including accents and possibly letter 
> capitalization, etc) and sort them before more fuzzy searches.
> 
> To me it looks like I have to run multiple sub-queries for each query,

> one for exact match, one for accents removed and one for stemmed words

> and then combine the results and compute the final score for each 
> match. Is that possible?
> 
> Thanks!
> 
> PaPa


Mime
View raw message