uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] Closed: (UIMA-210) faulty use of .read(buffer...) in several places - not checking for fewer than expected bytes/chars read
Date Mon, 12 Feb 2007 19:02:23 GMT
Adam Lally wrote:
>> I think this is fine.
>> Here's my thinking on the pros/cons:
>> Heap used:
>> The overall heap consumed by both methods, for file having "N" bytes 
>> in file length, ignoring heap consumed by result which would be the 
>> same in both cases:
>> Current: 10,000 chars for buf + approx N to N/2 (depending on 
>> encoding) chars in string buf + maybe lots of garbage as string buf 
>> is repeatedly expanded ( estimated as approx: N to N/2).   One way to 
>> reduce this is to get the file length
>> in bytes and preallocate the string buffer to, say N/2.
>> Previous:  N chars in buf
> IIRC the String Buffer "cheats" and doesn't reallocate the memory
> again when you call toString() on it (an advantage of being in the
> java.lang package I guess, user code can't do that)... unless you
> subsequently append more to the buffer.  If true then the "previous"
> approach has an additional N to N/2 chars in the String itself, which
> the current approach does not have.
>> So for large files, the previous could be wasteful by overallocating 
>> the buf in the case of character encoding being used, and the current 
>> is wasteful in terms of the stringbufer being reallocated repeatedly.
> But what about a file that was, say, 100 MB, regardless of character
> encoding?  Surely it is wasteful to allocate a 100 million character
> array as temporary storage and then also allocate about that much (or
> half that much) again for the String itself.

I agree with you, if, as you say, the StringBuffer "cheats".  I 
presumed, perhaps incorrectly, 
that it made a copy of the underlying char array object.  The JavaDocs 
imply this is what happens:

Implementation advice: This method can be coded so as to create a new 
|String| object without allocating new memory to hold a copy of the 
character sequence. Instead, the string can share the memory used by the 
string buffer. Any subsequent operation that alters the content or 
capacity of the string buffer must then make a copy of the internal 
buffer at that time. This strategy is effective for reducing the amount 
of memory allocated by a string concatenation operation when it is 
implemented using a string buffer.

Thanks for pointing that out!  -Marshall

View raw message