lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolás Lichtmaier <nicol...@wolfram.com.INVALID>
Subject Re: Lucene 8.7 error searching an index created with 8.3
Date Tue, 22 Dec 2020 19:05:25 GMT
I'd like to add that if I enable assertions I get a stack trace like this:


java.lang.AssertionError
     at 
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:903)
     at 
org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
     at 
org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
     at 
org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
     at 
org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
     at 
org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
     at 
org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
     at 
org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
     at 
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at 
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
     at 
org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
     at 
org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
     at 
org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
     at 
org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource$1.advanceExact(ValueSource.java:215)
     at 
com.wolfram.textsearch.MultiplicationDoubleValuesSource$1.advanceExact(MultiplicationDoubleValuesSource.java:60)
     ... 27 more

Meaning that posPendingCount in Lucene50PostingsReader is 0 when 
nextPosition() is called.

At the point the assertion fails these are the other values in this object:


> encoded = {byte[512]@2705} [0, 112, 7, 20, -48, -8, 16, 96, -99, 25, 
> +502 more]
> docDeltaBuffer = {int[147]@2706} [1164, 2, 506, 183, 3, 190, 1, 1, 1, 
> 57, +137 more]
> freqBuffer = {int[147]@2707} [1, 1, 1, 1, 1, 2, 1, 2, 1, 3, +137 more]
> posDeltaBuffer = {int[147]@2708} [7, 7, 333, 248, 262, 157, 414, 104, 
> 157, 409, +137 more]
> payloadLengthBuffer = null
> offsetStartDeltaBuffer = null
> offsetLengthBuffer = null
> payloadBytes = null
> payloadByteUpto = 0
> payloadLength = 0
> lastStartOffset = 0
> startOffset = -1
> endOffset = -1
> docBufferUpto = 3
> posBufferUpto = 3
> skipper = null
> skipped = false
> startDocIn = {ByteBufferIndexInput$SingleBufferImpl@2709} 
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")

> [slice=_0_Lucene50_0.doc]"
> docIn = {ByteBufferIndexInput$SingleBufferImpl@2710} 
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")

> [slice=_0_Lucene50_0.doc]"
> posIn = {ByteBufferIndexInput$SingleBufferImpl@2704} 
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")

> [slice=_0_Lucene50_0.pos]"
> payIn = null
> payload = null
> indexHasOffsets = false
> indexHasPayloads = false
> docFreq = 69
> totalTermFreq = 146
> docUpto = 3
> doc = 1672
> accum = 1672
> freq = 1
> position = 333
> posPendingCount = 0
> posPendingFP = -1
> payPendingFP = 0
> docTermStartFP = 683174
> posTermStartFP = 236756
> payTermStartFP = 0
> lastPosBlockFP = 236949
> skipOffset = -1
> nextSkipDoc = 2147483647
> needsOffsets = false
> needsPayloads = false
> singletonDocID = -1

Maybe this information is useful to see what's going on, or at least add 
some code somewhere to help clarify this issue.

Thanks!


El 24/11/20 a las 11:36, Nicolás Lichtmaier escribió:
> This is reproducible only within our product, I haven't yet been able 
> to isolate this and reproduce it standalone. It's Java 11.
>
> Yes, I've run CheckIndex with the "-slow" option and with assertions 
> enabled.
>
> El 24/11/20 a las 11:32, Adrien Grand escribió:
>> This is related to phrase matching indeed. Positions are stored in 
>> blocks of 128 values, where every block is encoded with a different 
>> number of bits per value. And the error you are seeing suggests that 
>> one block reports 69 bits per value.
>>
>> The fact that CheckIndex didn't complain is surprising. Did you only 
>> verify checksums (the -fast option) or did you run the full CheckIndex?
>>
>> Is your problem reproducible? If yes, does it still reproduce if you 
>> move to a recent JVM?
>>
>> On Tue, Nov 24, 2020 at 3:22 PM Nicolás Lichtmaier 
>> <nicolasl@wolfram.com <mailto:nicolasl@wolfram.com>> wrote:
>>
>>     Lucene 8.7's CheckIndex says there are no errors in the index.
>>
>>     On closer inspection this seems related to phrase matching...
>>
>>     El 24/11/20 a las 05:18, Adrien Grand escribió:
>>     > Can you run CheckIndex on your index to make sure it is not 
>> corrupt?
>>     >
>>     > On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier
>>     > <nicolasl@wolfram.com.invalid> wrote:
>>     >
>>     >> I'm seeing errors like this one (using backwards codecs):
>>     >>
>>     >> java.lang.ArrayIndexOutOfBoundsException: Index 69 out of
>>     bounds for
>>     >> length 33
>>     >>       at
>>     >>
>> org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
>>     >>       at
>>     >>
>> org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>>     >>       at
>>     >>
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>>     >>       at
>>     >>
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>>     >>       at
>>     >>
>> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>>     >>       at
>>     >>
>> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>>     >>       at
>> org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
>>     >>
>>     >> They seem to be connected with double values stored as
>>     "docvalues" and
>>     >> user in formulas to affect the scores.
>>     >>
>>     >> Is there any known incompatibility? Is this something that
>>     should work?
>>     >> Must I rebuild the indices with 8.7? (that would be bad for our
>>     usecase
>>     >> here)
>>     >>
>>     >> Thanks!
>>     >>
>>     >>
>>     >>
>> ---------------------------------------------------------------------
>>     >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>     <mailto:java-user-unsubscribe@lucene.apache.org>
>>     >> For additional commands, e-mail:
>>     java-user-help@lucene.apache.org
>>     <mailto:java-user-help@lucene.apache.org>
>>     >>
>>     >>
>>
>>
>>
>> -- 
>> Adrien
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message