lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Tomblin <ptomb...@xcski.com>
Subject Re: Scoring algorithm?
Date Sat, 31 Oct 2009 14:22:41 GMT
If I change the schema this way, do I need to re-submit all the
documents to Solr?  And if I have them all sitting on disk as XML
files that look like
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<doc>
<field name=...">...</field>
<field name=...">...</field>
</doc>
is there a quick way to submit them all to Solr?

On Sat, Oct 31, 2009 at 10:04 AM, Yonik Seeley
<yonik@lucidimagination.com> wrote:
> On Sat, Oct 31, 2009 at 8:48 AM, Paul Tomblin <ptomblin@xcski.com> wrote:
>> Am I right in thinking that a document that the sortable field is only
>> two sentences long and contains the search term once will score higher
>> than one that is 50 sentences long that contains the search term 4
>> times?
>
> Yep.  Assuming 15 tokens per sentence, doc1 will have
> lengthNorm = 1/(2*15)**.5 or 0.18 with  tf=1**.5 or 1
> doc2 will have
> lengthNorm  = 1/(50*15)**.5 or 0.04 with tf=4**.5 or 2
>
> Or if you don't want length normalization at all, simply use
> omitNorms=true in the schema for this field.
>
>>  Is there a way to change it to score higher based only on
>> number of hits?
>
> Yes, simply use omitNorms=true in the schema.xml for this field.
>
> If you still wanted a lengthNorm, you could change the balance by
> creating a custom similarity and overriding either lengthNorm() or
> tf()
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin

Mime
View raw message