poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Height, Jason" <jhei...@subcorp.com.au>
Subject RE: Improving POIFS performance
Date Thu, 09 Oct 2003 05:16:49 GMT
Hopefully when performance is looked at, copies of the entire file don't
need to be in memory. As it stands now when POIFS reads in a file, all of
the blocks are stored as byte arrays. Interestingly even though HSSF
understands the Workbook document stream, the byte array allocated within
POIFS remains. Ie if you have a 1 MB excel file then there is a 1MB byte
array in POIFS that is not cleared. Plus all of the temporary byte arrays
that are copied and created each time a HSSF record is read.

In a similar sense HSSF will write everything to one big byte array before
this is written to the POI.

<begin rant="true" ideas="spurious">
 
Ideally, and it sounds like this is the rough idea with what is going to
happen, there should be no need for temporary byte arrays to be either held
or passed around and everything could be done via reading and writing to
streams.

Additionally it would be cool to be able to "register" known documents with
POIFS so that the decoded document ie "Workbook" could be held onto rather
than the byte array. Sure, for unknown documents we would probably hold onto
the raw bytes. I was thinking of something like this:

POIFSDocument doc = new POIFSDocument();
doc.registerDocument("WorkBook", org.apache.poi.hssf.model.Workbook.class);
doc.setPreserveUnknownDocuments(true);
doc.readDocument(new FileInputStream(theFile));

internally POIFSDocument would see that it had hit a "Workbook", see if it
had a registered "decoder", create a new instance of it and then pass that
portion of the stream to the document decoder. If we ever get really
adventurous we can start to add decoders for other documents in the file
such as macros, etc etc.

Internally the constructor to HSSFWorkbook would need to change to be able
to do something like:

public HSSFWorkbook(File f) {
  POIFSDocument doc = new POIFSDocument();
  doc.registerDocument("WorkBook",
org.apache.poi.hssf.model.Workbook.class);
  doc.setPreserveUnknownDocuments(true);
  doc.readDocument(new FileInputStream(theFile));

  setModel( (WorkBook)doc.getDocument("Workbook"));
}

</begin>

Maybe what I have outlined will not be suitable for all approaches, in which
case support for a similar byte array approach as current would need to be
provided.

Jason


-----Original Message-----
From: Glen Stampoultzis [mailto:gstamp@iinet.net.au] 
Sent: Thursday, 9 October 2003 1:03 PM
To: POI Developers List
Subject: Re: Improving POIFS performance

At 11:35 AM 9/10/2003, you wrote:
> > So Glen said he should have time to help integrate it and sheppard it
and
> > Chris as a committer.
>
>Sounds good, FYI there have been some bug fixes to the code since you
>had a look at it. Glen, if/when you need an updated set just let me know.
>
>BTW I am not an Apache committer but am pretty involved with a couple of
>open source projects, so hopefully as a sheep I will not be too unruly.

Look forward to looking into what you've written.  It'll probably be on the 
weekend as I've just started on a new job and I want to get a good start
first.

Regards,


Glen Stampoultzis
gstamp@iinet.net.au
http://members.iinet.net.au/~gstamp/glen/
--------------------------------------------------------------------------------------------------------------------
This e-mail (including attachments) is confidential information of Australian Submarine Corporation
Pty Limited (ASC).  It may also be legally privileged.  Unauthorised use and disclosure is
prohibited.  ASC is not taken to have waived confidentiality or privilege if this e-mail was
sent to you in error. If you have received it in error, please notify the sender promptly.
 While ASC takes steps to identify and eliminate viruses, it cannot confirm that this e-mail
is free from them.  You should scan this e-mail for viruses before it is used.  The statements
in this e-mail are those of the sender only, unless specifically stated to be those of ASC
by someone with authority to do so.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message