lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Glesner <>
Subject Question on Lucene Behavior in 4.9 vs 5.4.1
Date Thu, 21 Apr 2016 13:39:37 GMT

I'm witnessing a change in behavior between Lucene 4.9 and 5.4.1 that I
don't quite understand.
I'd like to track down what's happening under the hood. I'm working to
update the dependencies of an open source geospatial resolution tool (, which uses Lucene. I've
indexed the database using both Lucene 4.9 and 5.4.1.  We
index on the Population of each city for later sorting on query.

When running a fuzzy query "bostn~" with Occur.MUST in 4.9, we get the
expected result of Boston, where 6793534 is a boosted population.  Here is
the scoreDoc.toString():

*Boston: doc=19586055 score=NaN shardIndex=-1 fields=[2.971942, 6793534]*

However, using 5.4.1, the fuzzy match with Occur.MUST returns "Basti Bosan"
and "Boston Basin", both of which have a population of zero before
returning Boston.

*Basti Bosan: doc=11707183 score=NaN shardIndex=0 fields=[1.5721874, 0]*

*Boston Basin: doc=12728320 score=NaN shardIndex=0 fields=[1.5721874,
0]Boston: doc=17515475 score=NaN shardIndex=0 fields=[1.4374285, 6793534]*

I'm wondering if something with the FIELD_SCORE calculation changed between
4.9 and 5.4.1, or perhaps I've done something incorrect in building the
index, etc.

It's worth mentioning that for this test I have built an index w/ both 4.9
and 5.4.1 using the same geonames database to ensure consistency.  Also,
sort is set up with both versions in the same way:

*private static final Sort POPULATION_SORT = new Sort(new SortField[] {
   SortField.FIELD_SCORE, *
* new SortedNumericSortField(SORT_POP.key(), SortField.Type.LONG, true) *

With regard to building the index, in 4.9, we added the population sort
field to the index like so:

*doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),

Because you can't sort on docValue = NONE anymore, in 5.4.1, we now add it
like this:

*doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),


*private static final FieldType LONG_FIELD_TYPE_STORED_SORTED = new

*static {   LONG_FIELD_TYPE_STORED_SORTED.setTokenized(false);

I would greatly appreciate any insights here; and I'm happy to answer
questions to unravel this a bit more. Thank you for your time!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message