poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nokleberg <ch...@sixlegs.com>
Subject Re: Improving POIFS performance
Date Thu, 09 Oct 2003 05:55:58 GMT
On Thu, Oct 09, 2003 at 02:46:49PM +0930, Height, Jason wrote:
> Ideally, and it sounds like this is the rough idea with what is going to
> happen, there should be no need for temporary byte arrays to be either held
> or passed around and everything could be done via reading and writing to
> streams.

Yes. Although the changes to POIFS will not have a tremendous initial
impact on performance since HSSF will still be buffering all of the
records (I have ideas for changing that, but it is a lot of work).

> Additionally it would be cool to be able to "register" known documents with
> POIFS so that the decoded document ie "Workbook" could be held onto rather
> than the byte array. Sure, for unknown documents we would probably hold onto
> the raw bytes. I was thinking of something like this:

In POIFS2 this is the Stream class, e.g.
  Document doc = new Document(new SeekableFile(file));
  Stream workbook = doc.findStream("Workbook");
  InputStream in = workbook.getInputStream();

Stream also has methods for getting at the underlying data in a
random-access fashion, for higher-level APIs that can leverage that.

> POIFSDocument doc = new POIFSDocument();
> doc.registerDocument("WorkBook", org.apache.poi.hssf.model.Workbook.class);
> doc.setPreserveUnknownDocuments(true);
> doc.readDocument(new FileInputStream(theFile));
> 
> internally POIFSDocument would see that it had hit a "Workbook", see if it
> had a registered "decoder", create a new instance of it and then pass that
> portion of the stream to the document decoder. If we ever get really
> adventurous we can start to add decoders for other documents in the file
> such as macros, etc etc.

Registration would be useful primarily for when reading from an
InputStream. When the underlying data is a file the POIFS2 document
reader can just pull in the table of contents and wait for you to ask
for a Stream. Given all of the other benefits it will almost always be
better to just pipe the stream to a file and then create the Document,
so I doubt an event-based API will be worthwhile. (It would be very rare
for a document not to be in a file anyway--even for HTTP file uploads
the libraries buffer large posts to disk)

Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org


Mime
View raw message