lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Glesner <jer...@bericotechnologies.com>
Subject Question on Lucene Behavior in 4.9 vs 5.4.1
Date Thu, 21 Apr 2016 13:39:37 GMT
Hello,

I'm witnessing a change in behavior between Lucene 4.9 and 5.4.1 that I
don't quite understand.
I'd like to track down what's happening under the hood. I'm working to
update the dependencies of an open source geospatial resolution tool (
https://github.com/Berico-Technologies/CLAVIN), which uses Lucene. I've
indexed the geonames.org database using both Lucene 4.9 and 5.4.1.  We
index on the Population of each city for later sorting on query.

When running a fuzzy query "bostn~" with Occur.MUST in 4.9, we get the
expected result of Boston, where 6793534 is a boosted population.  Here is
the scoreDoc.toString():

*Boston: doc=19586055 score=NaN shardIndex=-1 fields=[2.971942, 6793534]*

However, using 5.4.1, the fuzzy match with Occur.MUST returns "Basti Bosan"
and "Boston Basin", both of which have a population of zero before
returning Boston.

*Basti Bosan: doc=11707183 score=NaN shardIndex=0 fields=[1.5721874, 0]*


*Boston Basin: doc=12728320 score=NaN shardIndex=0 fields=[1.5721874,
0]Boston: doc=17515475 score=NaN shardIndex=0 fields=[1.4374285, 6793534]*

I'm wondering if something with the FIELD_SCORE calculation changed between
4.9 and 5.4.1, or perhaps I've done something incorrect in building the
index, etc.

It's worth mentioning that for this test I have built an index w/ both 4.9
and 5.4.1 using the same geonames database to ensure consistency.  Also,
sort is set up with both versions in the same way:

*private static final Sort POPULATION_SORT = new Sort(new SortField[] {
   SortField.FIELD_SCORE, *
* new SortedNumericSortField(SORT_POP.key(), SortField.Type.LONG, true) *
*});*

With regard to building the index, in 4.9, we added the population sort
field to the index like so:

*doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),
Field.Store.YES));*

Because you can't sort on docValue = NONE anymore, in 5.4.1, we now add it
like this:

*doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),
LONG_FIELD_TYPE_STORED_SORTED));*

where LONG_FIELD_TYPE_STORED_SORTED is:


*private static final FieldType LONG_FIELD_TYPE_STORED_SORTED = new
FieldType();*









*static {   LONG_FIELD_TYPE_STORED_SORTED.setTokenized(false);
 LONG_FIELD_TYPE_STORED_SORTED.setOmitNorms(true);
 LONG_FIELD_TYPE_STORED_SORTED.setIndexOptions(IndexOptions.DOCS);
 LONG_FIELD_TYPE_STORED_SORTED
.setNumericType(FieldType.NumericType.LONG);LONG_FIELD_TYPE_STORED_SORTED.setStored(true);LONG_FIELD_TYPE_STORED_SORTED.setDocValuesType(DocValuesType.NUMERIC);LONG_FIELD_TYPE_STORED_SORTED.freeze();}*

I would greatly appreciate any insights here; and I'm happy to answer
questions to unravel this a bit more. Thank you for your time!

V/r,
Jeremy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message