lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1
Date Fri, 04 Aug 2017 15:36:41 GMT
In addition to Shawn's comments, deleted but not merged documents
alter the statistics used for scoring, so the only hope that the
scores are comparable would be on an optimized index. And note that I
would recommend optimizing _only_ for testing, don't use it in a
production system unless the index is static. I.e. if your pattern is
build once a day and optimize, optimizing is fine, but not on a
continuously changing index.

Best,
Erick

On Fri, Aug 4, 2017 at 5:52 AM, Shawn Heisey <apache@elyograg.org> wrote:
> On 8/4/2017 1:02 AM, SOLR4189 wrote:
>> I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment.
>> When I checked it in the test environment, I noticed the order of returned docs for
each query is different. The score has changed as well. I use same similarity algorithm -
OccapiBM25 as in previous version. Number of shards and number of docs by shards also haven't
changed.
>
> You're comparing versions released more than two years apart, and across
> two major version upgrades.
>
> Solr is an application built around Lucene.  The score calculation in
> Lucene is frequently tweaked, producing slightly different results even
> with identical data.  Over such a large version discrepancy, I would be
> very surprised if the order and the scores were the same.
>
> Is the index identical between the versions?  If the indexes were each
> built from scratch by their respective versions, rather than going
> through an index upgrade procedure, they are very likely NOT completely
> identical.  Text analysis components are also tweaked frequently, to fix
> bugs and improve behavior.
>
> If the shard hash ranges are not the same on the old and new versions,
> that could contribute to differences in scoring as well.
>
> Are you writing because you're seeing different results, or because you
> think the order you're seeing in the newer version is wrong?
>
> Thanks,
> Shawn
>

Mime
View raw message