uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] Commented: (UIMA-210) faulty use of .read(buffer...) in several places - not checking for fewer than expected bytes/chars read
Date Mon, 12 Feb 2007 18:55:32 GMT
Adam Lally wrote:
>
> Was anything wrong with the original code, Marshall?  It was checking
> the return value of read appropriately, it looks like.  
Yes, that's correct, I think.
> So the only
> change here is to increase the size of the buffer to be the size of
> the file, which isn't necessary and could blow up the memory
> consumption when reading large files.
>

I think when you look at the details of the two implementations, and 
take into
account how StringBuffer works, both will end up with a buffer the size of
the file (assuming 1 byte per character encoding), and the approach 
using the
StringBuffer will actually create and throw away many other smaller sized
versions of the buffer, as it incrementally expands to the full file size. 

There is a point to be made about the buffer being too big for encodings 
where
multiple bytes are used per char.

The other advantage of course in doing it this way is it uses a buffer 
which
allows reading as much of the file as possible in a single read call - 
which could
be a significant performance improvement.  But these are all just minor 
tweaks...
at some point we should shift to using the NIO techniques.  Perhaps as 
Thilo
suggests, it may be good to use methods from the Commons IO project.

-Marshall



Mime
View raw message