hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks
Date Thu, 14 Feb 2013 08:52:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578244#comment-13578244
] 

Todd Lipcon commented on HADOOP-9307:
-------------------------------------

An example sequence of seeks which returns the wrong data is as follows, assuming a 4096-byte
buffer:

{code}
seek(0);
readFully(1);
{code}

This primes the buffer. After this, the current state of the buffered stream is {{pos=0, count=4096,
filepos=4096}}

{code}
seek(2000);
{code}

The seek sees that the required data is in already in the buffer, and just sets {{pos=2000}}

{code}
readFully(10000);
{code}

This first copies the remaining bytes from the buffer and sets {{pos=4096}}. Then, because
5904 bytes are remaining, and this is larger than the buffer size, it copies them directly
into the user-supplied output buffer. This leaves the state of the stream at {{pos=4096, count=4096,
filepos=12000}}

{code}
seek(11000);
{code}

The "optimization" in BufferedFSInputStream sees that there are 4096 buffered bytes, and that
this seek is supposedly within the window, assuming that those 4096 bytes directly precede
filepos. So, it erroneously just sets {{pos=3096}}.

The next read will then get the wrong results for the first 1000 bytes -- yielding bytes 3096-4096
of the file instead of bytes 11000-12000.
                
> BufferedFSInputStream.read returns wrong results after certain seeks
> --------------------------------------------------------------------
>
>                 Key: HADOOP-9307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9307
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 1.1.1, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> After certain sequences of seek/read, BufferedFSInputStream can silently return data
from the wrong part of the file. Further description in first comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message