lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Multiple Indexes and relevance ranking question
Date Sat, 02 Oct 2010 03:25:38 GMT
The score of a document has no scale: it only has meaning against other 
score in the same query.

Solr does not rank these documents correctly. Without sharing the TF/DF 
information across the shards, it cannot.

If the shards each have "a lot" of the same kind of document, this 
problem averages out. That is, the "statistical fingerprint" across the 
shards is similar enough that each index gives the same numerical range. 
Yes, this is hand-wavey, and we don't have a measuring tool that 
verifies this assertion.

Lance

Valli Indraganti wrote:
> I an new to Solr and the search technologies. I am playing around with
> multiple indexes. I configured Solr for Tomcat, created two tomcat fragments
> so that two solr webapps listen on port 8080 in tomcat. I have created two
> separate indexes using each webapp successfully.
>
> My documents are very primitive. Below is the structure. I have four such
> documents with different doc id and increased number of the word "Hello"
> corresponding to the name of the document (this is only to make my analysis
> of the results easier). Documents One and two are in shar1 and three and
> four are in shard 2. obviously, document two is ranked higher when queried
> against that index (for the word Hello). And document four is ranked higher
> when queried against second index. When using the shards, parameter, the
> scores remain unaltered.
> My question is, if the distributed search does not consider IDF, how is it
> able to rank these documents correctly? Or do I not have the indexes truely
> distributed? Is something wrong with my term distribution?
>
> <add>
>   -<#>  <doc>
>     <field name="*id*">Valli1</field>
>     <field name="*name*">One</field>
>     <field name="*text*">Hello!This is a test document testing relevancy
> scores.</field>
>    </doc>
> </add>
>
>    

Mime
View raw message