lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Strange index corruption related to numeric fields when upgrading from 6.0.1
Date Wed, 21 Sep 2016 16:29:44 GMT
Actually, it's more of a warning for both projects I think. The danger here is
that lots of things will work just fine with different characters in
the field names
than that recommendation. But there's no explicit testing (that I know of) for
all the places that could be affected.

Admittedly by the time you get to the Lucene level, lots of the places that
could be a problem are past (i.e. much of query parsing, DIH, etc), but to me
the benefit of different naming conventions aren't worth the risk of suddenly
starting to fail because you use a new code path.

But then I'm risk-averse ;).

Erick

On Wed, Sep 21, 2016 at 1:41 AM, Jan-Willem van den Broek
<Jan-Willem.van.den.Broek@valuecare.nl> wrote:
> Hi Erick,
>
> Isn't that a SOLR restriction? I can't find anything about it in the Lucene docs.
>
> If it applies to Lucene as well, then we have some work to do, since the brackets are
indeed part of the field name. (Also a space in front.) We use things like that a lot to avoid
collisions in generated and user-supplied names.
>
> I don't think it's the key to this issue though. The change I made that fixed my test
case still uses brackets and spaces. The Point and StoredField still use the name " [1]calculon",
but the DoubleDocValuesField is renamed to " [p]calculon".
>
> Thanks for the suggestion though. I'd never even considered that we might be using illegal
fieldnames.
>
> Regards,
> Jan-Willem
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Tuesday, September 20, 2016 19:02
> To: java-user <java-user@lucene.apache.org>
> Subject: Re: Strange index corruption related to numeric fields when upgrading from 6.0.1
>
> A wild shot in the dark: Are the square brackets really part of the field name? They
have never officially been supported, from the Ref
> Guide:
>
> "Field names should consist of alphanumeric or underscore characters only and not start
with a digit. This is not currently strictly enforced, but other field names will not have
first class support from all components and back compatibility is not guaranteed"
>
> Your statement "I cannot reproduce the issue if I give the DoubleDocValuesField a different
name" seems to indicate that it's not a code problem with Lucene if you don't put the brackets
in.....
>
> Best,
> Erick
>
> On Tue, Sep 20, 2016 at 9:04 AM, Jan-Willem van den Broek <Jan-Willem.van.den.Broek@valuecare.nl>
wrote:
>> Hi all,
>>
>> I have an application that works fine with 6.0.1, but if I go to 6.1.0 or 6.2.0 then
I occasionally get a corrupted index where the SegmentMerger keeps breaking on a numeric field.
>>
>> This is the exception I get:
>>
>> ... (stack of application code) ...
>> Caused by: java.lang.IllegalArgumentException: field=" [1]calculon" did not index
point values
>>         at org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(Lucene60PointsReader.java:126)
>>         at org.apache.lucene.codecs.lucene60.Lucene60PointsReader.size(Lucene60PointsReader.java:224)
>>         at org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:169)
>>         at org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
>>         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
>>         at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
>>         at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
>>         at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
>>         at
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concu
>> rrentMergeScheduler.java:626)
>>
>> The field " [1]calculon" is always either a LongPoint or DoublePoint with 1 dimension.
The documents containing this field always also contain both a StoredField, and a DoubleDocValuesField
with the same name.
>>
>> I cannot reproduce the issue if I give the DoubleDocValuesField a different name.
Is that something that I should be doing in general? I was under the impression that it is
OK to use the same name for all three related fields.
>>
>> Here is the infostream from a test that reproduces the issue:
>> http://wikisend.com/download/613238/merges.log
>>
>> Unfortunately, while I can reproduce the issue consistently in the full application,
I don't yet have a clean test case with just/mostly Lucene code.
>>
>> Any feedback is much appreciated!
>>
>> Jan-Willem v/d Broek
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message