lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mihai Soloi <>
Subject Re: Checksum mismatch in segments file
Date Tue, 26 Jun 2012 18:29:42 GMT
Hello Mike and Robert,

I am using the stable version of Lucene(i.e. 3.6) and what is actually 
going on is that the checksum (i.e. a long) is written as 8 bytes: the 
first 4 are 0, then the mismatched checksum value(i.e. checksum-1) is 
written in the next 4(reference: 
ChecksumIndexOutput.prepareCommit()).When finishCommit() happens the 
correct checksum is written to the buffer and then on close it's flushed 
to the directory.

A comment states that this is done for better testing. I've followed the 
code with the debugger and printed out the bytes in the logger and I can 
say that seeking back and overwriting are done as they should be.

You can run the test as 'mvn test 
-Dtest=org.apache.james.mailbox.lucene.hbase.IndexingTest' but there 
will be a lot of byte printing.

I am now looking at the AppendingCodec in version 4, and see if I can 
better use that implementation.

Thank you,

On 26.06.2012 13:30, Michael McCandless wrote:
> Hmm, the checksum is there to ensure all bits were persisted properly.
> But one trickiness is we first write 4 0 bytes, then seek back and
> write the checksum over those 4 bytes.  Could it be that the HBase
> IndexOutput impl can't handle seeking back and overwriting?
> If so, you should have a look at AppendingCodec, which fixes the
> places in Lucene's default codec that seek backwards on write ...
> Mike McCandless
> On Mon, Jun 25, 2012 at 11:55 AM, Mihai Soloi <> wrote:
>> Hello everybody,
>> I'm Mihai, a GSoC student, and I'm implementing an HBaseDirectory for Lucene
>> [1] in order to use it on James mailbox indexing. I've implemented
>> HIndexOutput/Input, they're persisting the segments file just fine in an
>> HBase table, but when I try to get an IndexWriter from my directory, it
>> reads the segment_N file but due to the check in SegmentInfos the current
>> checksum is different from the persisted one. I've tried finding a solution
>> but I can't reach one. Do you guys have any idea why this happens? This is
>> the stack trace:
>> org.apache.lucene.index.CorruptIndexException: checksum mismatch in segments
>> file (resource: ChecksumIndexInput(anonymous IndexInput))
>>     at
>>     at
>> org.apache.lucene.index.IndexFileDeleter.<init>(
>>     at org.apache.lucene.index.IndexWriter.<init>(
>>     at
>> org.apache.james.mailbox.lucene.hbase.IndexingTest.getWriter(
>>     at
>> org.apache.james.mailbox.lucene.hbase.IndexingTest.testIndexWriter(
>> [1]
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message