lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mihai Soloi <mihai.so...@gmail.com>
Subject Re: Checksum mismatch in segments file
Date Tue, 26 Jun 2012 18:29:42 GMT
Hello Mike and Robert,

I am using the stable version of Lucene(i.e. 3.6) and what is actually 
going on is that the checksum (i.e. a long) is written as 8 bytes: the 
first 4 are 0, then the mismatched checksum value(i.e. checksum-1) is 
written in the next 4(reference: 
ChecksumIndexOutput.prepareCommit()).When finishCommit() happens the 
correct checksum is written to the buffer and then on close it's flushed 
to the directory.

A comment states that this is done for better testing. I've followed the 
code with the debugger and printed out the bytes in the logger and I can 
say that seeking back and overwriting are done as they should be.

You can run the test as 'mvn test 
-Dtest=org.apache.james.mailbox.lucene.hbase.IndexingTest' but there 
will be a lot of byte printing.

I am now looking at the AppendingCodec in version 4, and see if I can 
better use that implementation.

Thank you,
Mihai


On 26.06.2012 13:30, Michael McCandless wrote:
> Hmm, the checksum is there to ensure all bits were persisted properly.
>
> But one trickiness is we first write 4 0 bytes, then seek back and
> write the checksum over those 4 bytes.  Could it be that the HBase
> IndexOutput impl can't handle seeking back and overwriting?
>
> If so, you should have a look at AppendingCodec, which fixes the
> places in Lucene's default codec that seek backwards on write ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Jun 25, 2012 at 11:55 AM, Mihai Soloi <mihai.soloi@gmail.com> wrote:
>> Hello everybody,
>>
>> I'm Mihai, a GSoC student, and I'm implementing an HBaseDirectory for Lucene
>> [1] in order to use it on James mailbox indexing. I've implemented
>> HIndexOutput/Input, they're persisting the segments file just fine in an
>> HBase table, but when I try to get an IndexWriter from my directory, it
>> reads the segment_N file but due to the check in SegmentInfos the current
>> checksum is different from the persisted one. I've tried finding a solution
>> but I can't reach one. Do you guys have any idea why this happens? This is
>> the stack trace:
>>
>> org.apache.lucene.index.CorruptIndexException: checksum mismatch in segments
>> file (resource: ChecksumIndexInput(anonymous IndexInput))
>>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335)
>>     at
>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:182)
>>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1168)
>>     at
>> org.apache.james.mailbox.lucene.hbase.IndexingTest.getWriter(IndexingTest.java:82)
>>     at
>> org.apache.james.mailbox.lucene.hbase.IndexingTest.testIndexWriter(IndexingTest.java:123)
>>
>> [1] http://code.google.com/a/apache-extras.org/p/mailbox-lucene-index-hbase/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message