lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Hardcoded checksum mechanism in BlockTreeTermsReader
Date Tue, 06 Dec 2016 10:36:07 GMT
We have learned over time not to trust the underlying store to
correctly record the bytes we wrote to it.

This is why checksumming is very strongly built into Lucene at this
point.  If you disable checksumming, when bits do flip, you get exotic
exceptions at search time that might look like Lucene bugs and can
cost a lot of time to explain.

It's not just the BlockTreeTermsReader; many other codec components
check the checksum with CodecUtil.checkFooter at search time.

Can you explain why it's necessary to remove it for your database
files based Directory?

Mike McCandless

On Tue, Dec 6, 2016 at 5:25 AM, Duke DAI <> wrote:
> Hi all,
> I'm customizing Lucene Directory, which extends based
> on database files. I do not need checksum again on IndexIndex and
> IndexOutput.
> But in BlockTreeTermsReader constructor, following code open a
> hard-coded BufferedChecksumIndexInput to checksum on raw IndexInput. I have
> to use CRC32 on IndexOutput to make through it. Is there any more graceful
> way to do checksum, such as let Directory construct a checksum instance
> instead of API Directory.openChecksumInput ?
>       String indexName = IndexFileNames.segmentFileName(segment,
> state.segmentSuffix, TERMS_INDEX_EXTENSION);
>       indexIn =, state.context);
>       CodecUtil.checkIndexHeader(indexIn, TERMS_INDEX_CODEC_NAME, version,
> version, state.segmentInfo.getId(), state.segmentSuffix);
>       CodecUtil.checksumEntireFile(indexIn);
> Best regards,
> Duke
> If not now, when? If not me, who?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message