lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: merge problems
Date Tue, 11 Oct 2016 23:58:08 GMT
OK I have a small test case showing the issue!

I opened https://issues.apache.org/jira/browse/LUCENE-7491

Thanks for reporting this, Hans.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Oct 11, 2016 at 12:08 PM, Hans Lund <ha.lund@gmail.com> wrote:
> hmm you're right - when it revealed a bug in our indexing code I stopped
> wondering ;-) but now I tried to create small tests to show the behavior -
> until now without success. I'm pretty sure that I can reproduce it by
> re-introducing our index bug, unfortunately it occurs after some hours
> parsing and indexing wikipedia dumps - but from there I'll try simplifying a
> test reproducing the setup.
>
> The setup we use is quite forward using MMapDirectory and a NRT setup - the
> only tailored functionality is our own IndexDeletionPolicy using an added
> timestamp in userdata for the index commit keeping a number of snapshots but
> honoring a max retention period, not that I suspect it to be the cause - but
> if fieldinfos from another snapshot is used in the merge that could cause
> problems
>
> Hans Lund
>
> On Tue, Oct 11, 2016 at 12:07 PM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Hmm, that should be "OK" from Lucene's standpoint.
>>
>> I mean, it should not result in strange merge exceptions later on.
>>
>> I think there's a bug somewhere in Lucene's efforts to pretend it's
>> fully schema-less ... I'll try to reproduce this.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Oct 11, 2016 at 4:38 AM, Hans Lund <ha.lund@gmail.com> wrote:
>> > Turned out to be must much simpler - we had added a new 'dynamic' field
>> > to
>> > a stats doc a count on articles based on identified language code.
>> > Having a
>> > set of test documents in German, English, Swedish - no one had suspected
>> > the obvious that the language detection categorized a single document as
>> > being Indonesian, making the stats count id:1.
>> >
>> > I realized that the debug output I added - made output of everything
>> > else
>> > that the interesting field (iterating over already added fields - not
>> > the
>> > field causing the error later on ;-)
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand <jpountz@gmail.com> wrote:
>> >
>> >> It looks like the field infos of your index went out of sync with data
>> >> stored in the files about points.
>> >>
>> >> Can you run CheckIndex on your index (potentially with the `-fast`
>> >> option
>> >> so that it only verifies checksums)? It could be that one of these two
>> >> parts of the index got corrupted.
>> >>
>> >> Since you were able to modify the way add(IndexableField) is
>> >> implemented,
>> >> I'm wondering if you are running a fork of Lucene? If yes, maybe you
>> >> did
>> >> some changes that triggered this bug?
>> >>
>> >> Otherwise is your application:
>> >>  - using IndexWriter.addIndexes?
>> >>  - customizing merging in some way, eg. by wrapping the merge readers?
>> >>
>> >> Le mar. 4 oct. 2016 à 16:40, Hans Lund <ha.lund@gmail.com> a écrit
:
>> >>
>> >> > After upgrading to 6.2 we are having problems during merges (after
>> >> running
>> >> > for a while).
>> >> >
>> >> > When the problem occurs its always complaining about the same field
-
>> >> > and
>> >> > throws:
>> >> >
>> >> > java.lang.IllegalArgumentException: field="id" did not index point
>> >> values
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(
>> >> Lucene60PointsReader.java:126)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.
>> >> size(Lucene60PointsReader.java:224)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.
>> >> merge(Lucene60PointsWriter.java:169)
>> >> >     at
>> >> > org.apache.lucene.index.SegmentMerger.mergePoints(
>> >> SegmentMerger.java:173)
>> >> >     at org.apache.lucene.index.SegmentMerger.merge(
>> >> SegmentMerger.java:122)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
>> >> >     at
>> >> > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
>> >> >
>> >> >
>> >> > To figure out where we messed up - I have added some ugly logging to
>> >> > Document:
>> >> >
>> >> > public final void add(IndexableField field) {
>> >> >         if ("id".equals(field.name()) &&
>> >> >                 field.fieldType().pointDimensionCount()
>> >> >                         != 0) {
>> >> >             System.err.println("Point value detected");
>> >> >             for (IndexableField i : fields) {
>> >> >                 System.err.println(i);
>> >> >             }
>> >> >         }
>> >> >         fields.add(field);
>> >> >   }
>> >> >
>> >> > In hope to intercept the document we messed up.
>> >> >
>> >> > But to my surprise toString on the suspected field just says
>> >> > (contains a
>> >> > URN):
>> >> >
>> >> > indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>
>> >> >
>> >> > So any hints as to why field.fieldType().pointDimensionCount() != 0
>> >> >
>> >> > and any suggestions what might cause this?
>> >> >
>> >> > Regards
>> >> > Hans Lund
>> >> >
>> >>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message