uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: Is buffering needed when setting org.xml.sax.InputSource.InputSource?
Date Mon, 12 Feb 2007 19:40:20 GMT
My understanding is that if one does reads like the
currently-checked-in FileUtils, where you read into a buffer of any
reasonable size, there is no additional advantage to using a
BufferedInputStream (you are essentially implementing buffering
yourself anyway).  The advantage comes if you want to use the 1-byte
read method, since this would be highly inefficient if you did not use
a BufferedInputStream to manage the buffer for you.

If Xerces didn't perform well when passed a FileInputStream, I'd say
that would be a bug in Xerces for sure.  It would be terrible to force
your users to create a BufferedInputStream every time they wanted to
parse something at a reasonable speed.

Tweaking buffer sizes could help I guess, feel free to do a test.  My
gut says the Xerces default will perform just fine, or somebody would
have changed it by now.

-Adam

On 2/12/07, Marshall Schor <msa@schor.com> wrote:
> Adam Lally wrote:
> > I doubt it.  Is there something that led you to believe this would be
> > necessary?
>
> Just doing some code inspection and seeing this - that it is perfectly
> feasible to
> pass a buffered version of the input to this, and that the general
> contract for IO
> seems to imply that you should use buffering for performance considerations.
> But I see from some web surfing that the Xerces impl does some buffering,
> and you can set the buffer size via a property (do we do that?  default
> = 2k I think,
> and the Apache license is about 1K by itself :-) ).
>
> I guess some simple test would tell...
>
> Some web surfing turned up:
>
> Parsers like Apache Xerces have the ability to set the input buffer size:
>
> |// Set the chunk to read in by SAX
>   parser.setProperty("http://apache.org/xml/properties/input-buffer-size",
>       new Integer(2048));
>
> See also http://xerces.apache.org/xerces2-j/properties.html
> which gives some advice on how large to set this.
>
> |
>
> -Marshall
>
>

Mime
View raw message