lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fornoville, Tom" <Tom.Fornovi...@truvo.com>
Subject RE: custom scorer in Solr
Date Mon, 14 Jun 2010 09:28:52 GMT
Hello Geert-Jan,

This seems like a very promising idea, I will test it out later today.
It is not expected that we have results in all buckets, we have many
use-cases where only 1 or 2 buckets are filled.
It is also not a problem that the first 10 results (or 20 in our case)
all fall in the same bucket.

I'll keep you updated on how this works out.

-----Original Message-----
From: Geert-Jan Brits [mailto:gbrits@gmail.com] 
Sent: maandag 14 juni 2010 11:00
To: solr-user@lucene.apache.org
Subject: Re: custom scorer in Solr

First of all,

Do you expect every query to return results for all 4 buckets?
i.o.w: say you make a Sortfield that sorts for score 4 first, than 3, 2,
1.
When displaying the first 10 results, is it ok that these documents
potentially all have score 4, and thus only bucket 1 is filled?

If so, I can think of the following out-of-the-box option works: (which
I'm
not sure performs enough, but you can easily test it on your data)

following your example create 4 fields:
1. categoryExact - configure anaylzers so that only full matches score,
other
2. categoryPartial - configure so that full and partial match (likely
you
have already configured this)
3. nameExact - like 1
4. namepartial - like 2

configure copyfields: 1 --> 2 and 3 --> 4
this way your indexing client can stay the same as it likely is at the
moment.


Now you have 4 fields which scores you have to combine on search-time so
that the evenual scores are [1,4]
Out-of-the-box you can do this with functionqueries.

http://wiki.apache.org/solr/FunctionQuery

I don't have time to write it down exactly, but for each field:
- calc the score of each field (use the Query functionquery (nr 16 in
the
wiki) . If score > 0 use the map function to map it to respectively
4,3,2,1.

now for each document you have potentially multiple scores for instance:
4
and 2 if your doc matches exact and partial on category.
- use the max functionquery to only return the highest score --> 4 in
this
case.

You have to find out for yourself if this performs though.

Hope that helps,
Geert-Jan


2010/6/14 Fornoville, Tom <Tom.Fornoville@truvo.com>

> I've been investigating this further and I might have found another
path
> to consider.
>
> Would it be possible to create a custom implementation of a SortField,
> comparable to the RandomSortField, to tackle the problem?
>
>
> I know it is not your standard question but would really appreciate
all
> feedback and suggestions on this because this is the issue that will
> make or break the acceptance of Solr for this client.
>
> Thanks,
> Tom
>
> -----Original Message-----
> From: Fornoville, Tom
> Sent: woensdag 9 juni 2010 15:35
> To: solr-user@lucene.apache.org
> Subject: custom scorer in Solr
>
> Hi all,
>
>
>
> We are currently working on a proof-of-concept for a client using Solr
> and have been able to configure all the features they want except the
> scoring.
>
>
>
> Problem is that they want scores that make results fall in buckets:
>
> *       Bucket 1: exact match on category (score = 4)
> *       Bucket 2: exact match on name (score = 3)
> *       Bucket 3: partial match on category (score = 2)
> *       Bucket 4: partial match on name (score = 1)
>
>
>
> First thing we did was develop a custom similarity class that would
> return the correct score depending on the field and an exact or
partial
> match.
>
>
>
> The only problem now is that when a document matches on both the
> category and name the scores are added together.
>
> Example: searching for "restaurant" returns documents in the category
> restaurant that also have the word restaurant in their name and thus
get
> a score of 5 (4+1) but they should only get 4.
>
>
>
> I assume for this to work we would need to develop a custom Scorer
class
> but we have no clue on how to incorporate this in Solr.
>
> Maybe there is even a simpler solution that we don't know about.
>
>
>
> All suggestions welcome!
>
>
>
> Thanks,
>
> Tom
>
>

Mime
View raw message