tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith R. Bennett (JIRA)" <j...@apache.org>
Subject [jira] Updated: (TIKA-45) RereadableInputStream needs to be able to read to the end of the original stream on first rewind.
Date Sat, 06 Oct 2007 14:10:50 GMT

     [ https://issues.apache.org/jira/browse/TIKA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Keith R. Bennett updated TIKA-45:
---------------------------------

    Attachment: RereadableInputStreamTest.java
                RereadableInputStream.java
                tika45.patch

I've attached both a patch, and the patched source files for your convenience in viewing.

Changes to the RereadableInputStream include:

* Addresses this issue by defaulting to reading until the end of the original input stream
on the first rewind, but also provides a constructor with a boolean value specifying whether
or not to do this.

* Added javadoc.

Thanks to Chris Mattmann for his suggestion regarding this issue.

As you can see, this class has a unit test, but given its importance, more testing would be
a Good Thing.

I'm pasting here a TODO comment from the file because it describes what I think is a better
solution to the problem:

    // TODO: At some point it would be better to replace the current approach
    // (specifying the above) with more automated behavior.  The stream could
    // keep the original stream open until EOF was reached.  For example, if:
    //
    // the original stream is 10 bytes, and
    // only 2 bytes are read on the first pass
    // rewind() is called
    // 5 bytes are read
    //
    // In this case, this instance gets the first 2 from its store,
    // and the next 3 from the original stream, saving those additional 3
    // bytes in the store.  In this way, only the maximum number of bytes
    // ever needed must be saved in the store; unused bytes are never read.
    // The original stream is closed when EOF is reached, or when close()
    // is called, whichever comes first.  Using this approach eliminates
    // the need to specify the flag (though makes implementation more complex).

- Keith

> RereadableInputStream needs to be able to read to the end of the original stream on first
rewind.
> -------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-45
>                 URL: https://issues.apache.org/jira/browse/TIKA-45
>             Project: Tika
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: RereadableInputStream.java, RereadableInputStreamTest.java, tika45.patch
>
>
> RereadableInputStream reads a stream's content into a store (memory or file) on its first
pass.  If rewind() is called before end of stream is reached, the bytes not yet read will
not be available on subsequent reads of the RereadableInputStream.  This could be a problem,
for example, if a parser uses it to get metadata from the beginning of a stream and calls
rewind(), expecting to get the entire document.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message