lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: segment gets corrupted (after background merge ?)
Date Fri, 14 Jan 2011 18:55:07 GMT
OK given that you're seeing non-deterministic results on the same
index... I think this is likely a hardware issue or a JRE bug?

If you move that index over to another env and run CheckIndex, is it consistent?

Mike

On Fri, Jan 14, 2011 at 9:00 AM, Stéphane Delprat
<stephane.delprat@blogspirit.com> wrote:
> So I ran checkIndex (without -fix) 5 times in a row :
>
> SOLR was running, but no client connected to it. (just the slave which was
> synchronizing every 5 minutes)
>
> summary :
>
> 1: all good
> 2: 2 errors: (seg 1 & 2) terms, freq, prox...ERROR [term blog_id:104150: doc
> 324697 <= lastDoc 324697] & terms, freq, prox...ERROR [term
> blog_keywords:SPORT: doc 174808 <= lastDoc 174808]
> 3: 1 error: (seg 2) terms, freq, prox...ERROR [Index: 105, Size: 51]
> 4: all good
> 5: 1 error: (seg 7) terms, freq, prox...ERROR [term blog_comments: %X
> docFreq=1 != num docs seen 0 + num docs deleted 0]
>
> Seams to me that some random things are happening here.
>
> File system is ext3, on a physical server.
>
>
> Here are the logs of the interesting segments :
>
> ** 1 **
>
>  1 of 17: name=_nqt docCount=431889
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=1,671.375
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_nqt_1y2.del]
>    test: open reader.........OK [41918 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs;
> 59357374 tokens]
>    test: stored fields.......OK [11505678 total field count; avg 29.504
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>  2 of 17: name=_ol7 docCount=913886
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=3,567.739
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_ol7_1mc.del]
>    test: open reader.........OK [74076 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs;
> 132337348 tokens]
>    test: stored fields.......OK [26933113 total field count; avg 32.07
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
> ** 2 **
>
>  1 of 17: name=_nqt docCount=431889
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=1,671.375
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
> _20, java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_nqt_1y2.del]
>    test: open reader.........OK [41918 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...ERROR [term blog_id:104150: doc 324697 <=
> lastDoc 324697]
> java.lang.RuntimeException: term blog_id:104150: doc 324697 <= lastDoc
> 324697
>        at
> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:644)
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>    test: stored fields.......OK [11505678 total field count; avg 29.504
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> FAILED
>    WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: Term Index test failed
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>
>  2 of 17: name=_ol7 docCount=913886
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=3,567.739
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_ol7_1mc.del]
>    test: open reader.........OK [74076 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...ERROR [term blog_keywords:SPORT: doc 174808 <=
> lastDoc 174808]
> java.lang.RuntimeException: term blog_keywords:SPORT: doc 174808 <= lastDoc
> 174808
>        at
> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:644)
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>    test: stored fields.......OK [26933113 total field count; avg 32.07
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> FAILED
>    WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: Term Index test failed
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>
> ** 3 **
>
>  1 of 17: name=_nqt docCount=431889
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=1,671.375
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_nqt_1y2.del]
>    test: open reader.........OK [41918 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs;
> 59357374 tokens]
>    test: stored fields.......OK [11505678 total field count; avg 29.504
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>  2 of 17: name=_ol7 docCount=913886
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=3,567.739
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_ol7_1mc.del]
>    test: open reader.........OK [74076 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...ERROR [Index: 105, Size: 51]
> java.lang.IndexOutOfBoundsException: Index: 105, Size: 51
>        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>        at java.util.ArrayList.get(ArrayList.java:322)
>        at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:288)
>        at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:277)
>        at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86)
>        at
> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129)
>        at
> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:631)
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>    test: stored fields.......OK [26933113 total field count; avg 32.07
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> FAILED
>    WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: Term Index test failed
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>
> ** 4 **
>
>  1 of 17: name=_nqt docCount=431889
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=1,671.375
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_nqt_1y2.del]
>    test: open reader.........OK [41918 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs;
> 59357374 tokens]
>    test: stored fields.......OK [11505678 total field count; avg 29.504
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>  2 of 17: name=_ol7 docCount=913886
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=3,567.739
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_ol7_1mc.del]
>    test: open reader.........OK [74076 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs;
> 132337349 tokens]
>    test: stored fields.......OK [26933113 total field count; avg 32.07
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>  7 of 17: name=_qqy docCount=69109
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=468.861
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_qqy_18.del]
>    test: open reader.........OK [22672 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [1431363 terms; 10684603 terms/docs pairs;
> 13097892 tokens]
>    test: stored fields.......OK [1688599 total field count; avg 36.363
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
> ** 5 **
>
>  1 of 17: name=_nqt docCount=431889
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=1,671.375
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_nqt_1y2.del]
>    test: open reader.........OK [41918 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs;
> 59357374 tokens]
>    test: stored fields.......OK [11505678 total field count; avg 29.504
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>  2 of 17: name=_ol7 docCount=913886
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=3,567.739
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_ol7_1mc.del]
>    test: open reader.........OK [74076 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs;
> 132337348 tokens]
>    test: stored fields.......OK [26933113 total field count; avg 32.07
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>  7 of 17: name=_qqy docCount=69109
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=468.861
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
> _20, java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_qqy_18.del]
>    test: open reader.........OK [22672 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...ERROR [term blog_comments: %X docFreq=1 != num
> docs seen 0 + num docs deleted 0]
> java.lang.RuntimeException: term blog_comments: %X docFreq=1 != num docs
> seen 0 + num docs deleted 0
>        at
> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>    test: stored fields.......OK [1688599 total field count; avg 36.363
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> FAILED
>    WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: Term Index test failed
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>
>
>
> Thanks,
>
>
> Le 14/01/2011 12:59, Michael McCandless a écrit :
>>
>> Right, but removing a segment out from under a live IW (when you run
>> CheckIndex with -fix) is deadly, because that other IW doesn't know
>> you've removed the segment, and will later commit a new segment infos
>> still referencing that segment.
>>
>> The nature of this particular exception from CheckIndex is very
>> strange... I think it can only be a bug in Lucene, a bug in the JRE or
>> a hardware issue (bits are flipping somewhere).
>>
>> I don't think an error in the IO system can cause this particular
>> exception (it would cause others), because the deleted docs are loaded
>> up front when SegmentReader is init'd...
>>
>> This is why I'd really like to see if a given corrupt index always
>> hits precisely the same exception if you run CheckIndex more than
>> once.
>>
>> Mike
>>
>> On Thu, Jan 13, 2011 at 10:56 PM, Lance Norskog<goksron@gmail.com>  wrote:
>>>
>>> 1) CheckIndex is not supposed to change a corrupt segment, only remove
>>> it.
>>> 2) Are you using local hard disks, or do run on a common SAN or remote
>>> file server? I have seen corruption errors on SANs, where existing
>>> files have random changes.
>>>
>>> On Thu, Jan 13, 2011 at 11:06 AM, Michael McCandless
>>> <lucene@mikemccandless.com>  wrote:
>>>>
>>>> Generally it's not safe to run CheckIndex if a writer is also open on
>>>> the index.
>>>>
>>>> It's not safe because CheckIndex could hit FNFE's on opening files,
>>>> or, if you use -fix, CheckIndex will change the index out from under
>>>> your other IndexWriter (which will then cause other kinds of
>>>> corruption).
>>>>
>>>> That said, I don't think the corruption that CheckIndex is detecting
>>>> in your index would be caused by having a writer open on the index.
>>>> Your first CheckIndex has a different deletes file (_phe_p3.del, with
>>>> 44824 deleted docs) than the 2nd time you ran it (_phe_p4.del, with
>>>> 44828 deleted docs), so it must somehow have to do with that change.
>>>>
>>>> One question: if you have a corrupt index, and run CheckIndex on it
>>>> several times in a row, does it always fail in the same way?  (Ie the
>>>> same term hits the below exception).
>>>>
>>>> Is there any way I could get a copy of one of your corrupt cases?  I
>>>> can then dig...
>>>>
>>>> Mike
>>>>
>>>> On Thu, Jan 13, 2011 at 10:52 AM, Stéphane Delprat
>>>> <stephane.delprat@blogspirit.com>  wrote:
>>>>>
>>>>> I understand less and less what is happening to my solr.
>>>>>
>>>>> I did a checkIndex (without -fix) and there was an error...
>>>>>
>>>>> So a did another checkIndex with -fix and then the error was gone. The
>>>>> segment was alright
>>>>>
>>>>>
>>>>> During checkIndex I do not shut down the solr server, I just make sure
>>>>> no
>>>>> client connect to the server.
>>>>>
>>>>> Should I shut down the solr server during checkIndex ?
>>>>>
>>>>>
>>>>>
>>>>> first checkIndex :
>>>>>
>>>>>  4 of 17: name=_phe docCount=264148
>>>>>    compound=false
>>>>>    hasProx=true
>>>>>    numFiles=9
>>>>>    size (MB)=928.977
>>>>>    diagnostics = {optimize=false, mergeFactor=10,
>>>>> os.version=2.6.26-2-amd64,
>>>>> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
>>>>> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
>>>>> java.vendor=Sun Microsystems Inc.}
>>>>>    has deletions [delFileName=_phe_p3.del]
>>>>>    test: open reader.........OK [44824 deleted docs]
>>>>>    test: fields..............OK [51 fields]
>>>>>    test: field norms.........OK [51 fields]
>>>>>    test: terms, freq, prox...ERROR [term post_id:562 docFreq=1 != num
>>>>> docs
>>>>> seen 0 + num docs deleted 0]
>>>>> java.lang.RuntimeException: term post_id:562 docFreq=1 != num docs seen
>>>>> 0 +
>>>>> num docs deleted 0
>>>>>        at
>>>>> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
>>>>>        at
>>>>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>>>>>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>>>>>    test: stored fields.......OK [7206878 total field count; avg 32.86
>>>>> fields
>>>>> per doc]
>>>>>    test: term vectors........OK [0 total vector count; avg 0 term/freq
>>>>> vector fields per doc]
>>>>> FAILED
>>>>>    WARNING: fixIndex() would remove reference to this segment; full
>>>>> exception:
>>>>> java.lang.RuntimeException: Term Index test failed
>>>>>        at
>>>>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>>>>>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>>>>>
>>>>>
>>>>> a few minutes latter :
>>>>>
>>>>>  4 of 18: name=_phe docCount=264148
>>>>>    compound=false
>>>>>    hasProx=true
>>>>>    numFiles=9
>>>>>    size (MB)=928.977
>>>>>    diagnostics = {optimize=false, mergeFactor=10,
>>>>> os.version=2.6.26-2-amd64,
>>>>> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
>>>>> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
>>>>> _20, java.vendor=Sun Microsystems Inc.}
>>>>>    has deletions [delFileName=_phe_p4.del]
>>>>>    test: open reader.........OK [44828 deleted docs]
>>>>>    test: fields..............OK [51 fields]
>>>>>    test: field norms.........OK [51 fields]
>>>>>    test: terms, freq, prox...OK [3200899 terms; 26804334 terms/docs
>>>>> pairs;
>>>>> 28919124 tokens]
>>>>>    test: stored fields.......OK [7206764 total field count; avg 32.86
>>>>> fields
>>>>> per doc]
>>>>>    test: term vectors........OK [0 total vector count; avg 0 term/freq
>>>>> vector fields per doc]
>>>>>
>>>>>
>>>>> Le 12/01/2011 16:50, Michael McCandless a écrit :
>>>>>>
>>>>>> Curious... is it always a docFreq=1 != num docs seen 0 + num docs
>>>>>> deleted
>>>>>> 0?
>>>>>>
>>>>>> It looks like new deletions were flushed against the segment (del
file
>>>>>> changed from _ncc_22s.del to _ncc_24f.del).
>>>>>>
>>>>>> Are you hitting any exceptions during indexing?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat
>>>>>> <stephane.delprat@blogspirit.com>    wrote:
>>>>>>>
>>>>>>> I got another corruption.
>>>>>>>
>>>>>>> It sure looks like it's the same type of error. (on a different
>>>>>>> field)
>>>>>>>
>>>>>>> It's also not linked to a merge, since the segment size did not
>>>>>>> change.
>>>>>>>
>>>>>>>
>>>>>>> *** good segment :
>>>>>>>
>>>>>>>  1 of 9: name=_ncc docCount=1841685
>>>>>>>    compound=false
>>>>>>>    hasProx=true
>>>>>>>    numFiles=9
>>>>>>>    size (MB)=6,683.447
>>>>>>>    diagnostics = {optimize=false, mergeFactor=10,
>>>>>>> os.version=2.6.26-2-amd64,
>>>>>>> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 -
>>>>>>> 2010-06-06
>>>>>>> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
>>>>>>> _20, java.vendor=Sun Microsystems Inc.}
>>>>>>>    has deletions [delFileName=_ncc_22s.del]
>>>>>>>    test: open reader.........OK [275881 deleted docs]
>>>>>>>    test: fields..............OK [51 fields]
>>>>>>>    test: field norms.........OK [51 fields]
>>>>>>>    test: terms, freq, prox...OK [17952652 terms; 174113812
terms/docs
>>>>>>> pairs;
>>>>>>> 204561440 tokens]
>>>>>>>    test: stored fields.......OK [45511958 total field count;
avg
>>>>>>> 29.066
>>>>>>> fields per doc]
>>>>>>>    test: term vectors........OK [0 total vector count; avg
0
>>>>>>> term/freq
>>>>>>> vector fields per doc]
>>>>>>>
>>>>>>>
>>>>>>> a few hours latter :
>>>>>>>
>>>>>>> *** broken segment :
>>>>>>>
>>>>>>>  1 of 17: name=_ncc docCount=1841685
>>>>>>>    compound=false
>>>>>>>    hasProx=true
>>>>>>>    numFiles=9
>>>>>>>    size (MB)=6,683.447
>>>>>>>    diagnostics = {optimize=false, mergeFactor=10,
>>>>>>> os.version=2.6.26-2-amd64,
>>>>>>> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 -
>>>>>>> 2010-06-06
>>>>>>> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
>>>>>>> _20, java.vendor=Sun Microsystems Inc.}
>>>>>>>    has deletions [delFileName=_ncc_24f.del]
>>>>>>>    test: open reader.........OK [278167 deleted docs]
>>>>>>>    test: fields..............OK [51 fields]
>>>>>>>    test: field norms.........OK [51 fields]
>>>>>>>    test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1
!=
>>>>>>> num
>>>>>>> docs seen 0 + num docs deleted 0]
>>>>>>> java.lang.RuntimeException: term post_id:1599104 docFreq=1 !=
num
>>>>>>> docs
>>>>>>> seen
>>>>>>> 0 + num docs deleted 0
>>>>>>>        at
>>>>>>> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
>>>>>>>        at
>>>>>>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>>>>>>>        at
>>>>>>> org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>>>>>>>    test: stored fields.......OK [45429565 total field count;
avg
>>>>>>> 29.056
>>>>>>> fields per doc]
>>>>>>>    test: term vectors........OK [0 total vector count; avg
0
>>>>>>> term/freq
>>>>>>> vector fields per doc]
>>>>>>> FAILED
>>>>>>>    WARNING: fixIndex() would remove reference to this segment;
full
>>>>>>> exception:
>>>>>>> java.lang.RuntimeException: Term Index test failed
>>>>>>>        at
>>>>>>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>>>>>>>        at
>>>>>>> org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>>>>>>>
>>>>>>>
>>>>>>> I'll activate infoStream for next time.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Le 12/01/2011 00:49, Michael McCandless a écrit :
>>>>>>>>
>>>>>>>> When you hit corruption is it always this same problem?:
>>>>>>>>
>>>>>>>>   java.lang.RuntimeException: term source:margolisphil docFreq=1
!=
>>>>>>>> num docs seen 0 + num docs deleted 0
>>>>>>>>
>>>>>>>> Can you run with Lucene's IndexWriter infoStream turned on,
and
>>>>>>>> catch
>>>>>>>> the output leading to the corruption?  If something is somehow
>>>>>>>> messing
>>>>>>>> up the bits in the deletes file that could cause this.
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat
>>>>>>>> <stephane.delprat@blogspirit.com>      wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We are using :
>>>>>>>>> Solr Specification Version: 1.4.1
>>>>>>>>> Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17
>>>>>>>>> 18:06:42
>>>>>>>>> Lucene Specification Version: 2.9.3
>>>>>>>>> Lucene Implementation Version: 2.9.3 951790 - 2010-06-06
01:30:55
>>>>>>>>>
>>>>>>>>> # java -version
>>>>>>>>> java version "1.6.0_20"
>>>>>>>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed
mode)
>>>>>>>>>
>>>>>>>>> We want to index 4M docs in one core (and when it works
fine we
>>>>>>>>> will
>>>>>>>>> add
>>>>>>>>> other cores with 2M on the same server) (1 doc ~= 1kB)
>>>>>>>>>
>>>>>>>>> We use SOLR replication every 5 minutes to update the
slave server
>>>>>>>>> (queries
>>>>>>>>> are executed on the slave only)
>>>>>>>>>
>>>>>>>>> Documents are changing very quickly, during a normal
day we will
>>>>>>>>> have
>>>>>>>>> approx
>>>>>>>>> :
>>>>>>>>> * 200 000 updated docs
>>>>>>>>> * 1000 new docs
>>>>>>>>> * 200 deleted docs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I attached the last good checkIndex : solr20110107.txt
>>>>>>>>> And the corrupted one : solr20110110.txt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is not the first time a segment gets corrupted on
this server,
>>>>>>>>> that's
>>>>>>>>> why I ran frequent "checkIndex". (but as you can see
the first
>>>>>>>>> segment
>>>>>>>>> is
>>>>>>>>> 1.800.000 docs and it works fine!)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can't find any "SEVER" "FATAL" or "exception" in the
Solr logs.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I also attached my schema.xml and solrconfig.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is there something wrong with what we are doing ? Do
you need other
>>>>>>>>> info
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>
>

Mime
View raw message