lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
Date Wed, 09 Apr 2014 12:25:15 GMT


Simon Willnauer commented on LUCENE-5583:

I personally think you shouldn't pass this shared buffer to readBytes() it can break all delegates.
I wonder if we want to add a skipBytes method to DataInput that we can impl. efficienly on
the lower levels and that just calls readByte() in a loop as a default impl?

> Should BufferedChecksumIndexInput have its own buffer?
> ------------------------------------------------------
>                 Key: LUCENE-5583
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.8
>            Reporter: Adrien Grand
> I was playing with on-the-fly checksum verification and this made me stumble upon an
issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws IOException {
>     assert numBytes >= 0;
>     for (long skipped = 0; skipped < numBytes; ) {
>       final int toRead = (int) Math.min(numBytes - skipped, SKIP_BUFFER.length);
>       in.readBytes(SKIP_BUFFER, 0, toRead);
>       skipped += toRead;
>     }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since the content
that is read doesn't matter here. However, it breaks with {{BufferedChecksumIndexInput}} because
of the way that it updates the checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
>     throws IOException {
>     main.readBytes(b, offset, len);
>     digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started modifying
the content of {{b}} before the call to {{digest.update(b, offset, len)}} finished, then your
checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private buffer first
instead of relying on the user-provided buffer.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message