lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Question on Lucene Behavior in 4.9 vs 5.4.1
Date Fri, 22 Apr 2016 07:53:04 GMT
FuzzyQuery scoring was changen in Lucene 5.3:
https://issues.apache.org/jira/browse/LUCENE-329

Maybe look at the result of IndexSearcher.explain to understand why the
"Boston" doc got a lower score than you "Basti Bosan" doc?

Le jeu. 21 avr. 2016 à 15:39, Jeremy Glesner <jeremy@bericotechnologies.com>
a écrit :

> Hello,
>
> I'm witnessing a change in behavior between Lucene 4.9 and 5.4.1 that I
> don't quite understand.
> I'd like to track down what's happening under the hood. I'm working to
> update the dependencies of an open source geospatial resolution tool (
> https://github.com/Berico-Technologies/CLAVIN), which uses Lucene. I've
> indexed the geonames.org database using both Lucene 4.9 and 5.4.1.  We
> index on the Population of each city for later sorting on query.
>
> When running a fuzzy query "bostn~" with Occur.MUST in 4.9, we get the
> expected result of Boston, where 6793534 is a boosted population.  Here is
> the scoreDoc.toString():
>
> *Boston: doc=19586055 score=NaN shardIndex=-1 fields=[2.971942, 6793534]*
>
> However, using 5.4.1, the fuzzy match with Occur.MUST returns "Basti Bosan"
> and "Boston Basin", both of which have a population of zero before
> returning Boston.
>
> *Basti Bosan: doc=11707183 score=NaN shardIndex=0 fields=[1.5721874, 0]*
>
>
> *Boston Basin: doc=12728320 score=NaN shardIndex=0 fields=[1.5721874,
> 0]Boston: doc=17515475 score=NaN shardIndex=0 fields=[1.4374285, 6793534]*
>
> I'm wondering if something with the FIELD_SCORE calculation changed between
> 4.9 and 5.4.1, or perhaps I've done something incorrect in building the
> index, etc.
>
> It's worth mentioning that for this test I have built an index w/ both 4.9
> and 5.4.1 using the same geonames database to ensure consistency.  Also,
> sort is set up with both versions in the same way:
>
> *private static final Sort POPULATION_SORT = new Sort(new SortField[] {
>    SortField.FIELD_SCORE, *
> * new SortedNumericSortField(SORT_POP.key(), SortField.Type.LONG, true) *
> *});*
>
> With regard to building the index, in 4.9, we added the population sort
> field to the index like so:
>
> *doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),
> Field.Store.YES));*
>
> Because you can't sort on docValue = NONE anymore, in 5.4.1, we now add it
> like this:
>
> *doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),
> LONG_FIELD_TYPE_STORED_SORTED));*
>
> where LONG_FIELD_TYPE_STORED_SORTED is:
>
>
> *private static final FieldType LONG_FIELD_TYPE_STORED_SORTED = new
> FieldType();*
>
>
>
>
>
>
>
>
>
> *static {   LONG_FIELD_TYPE_STORED_SORTED.setTokenized(false);
>  LONG_FIELD_TYPE_STORED_SORTED.setOmitNorms(true);
>  LONG_FIELD_TYPE_STORED_SORTED.setIndexOptions(IndexOptions.DOCS);
>  LONG_FIELD_TYPE_STORED_SORTED
>
> .setNumericType(FieldType.NumericType.LONG);LONG_FIELD_TYPE_STORED_SORTED.setStored(true);LONG_FIELD_TYPE_STORED_SORTED.setDocValuesType(DocValuesType.NUMERIC);LONG_FIELD_TYPE_STORED_SORTED.freeze();}*
>
> I would greatly appreciate any insights here; and I'm happy to answer
> questions to unravel this a bit more. Thank you for your time!
>
> V/r,
> Jeremy
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message