poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yegor Kozlov <ye...@dinom.ru>
Subject Re: a 'lite' version of ooxml-schemas jar
Date Tue, 24 Nov 2009 10:02:56 GMT

Finally it settled in my head. I finished improving build.xml - all seems to be working OK.
I named the new artifact poi-ooxml-schemas, the prefix poi- clearly indicates that it is a
POI derivative from 
ooxml-schemas.jar.

I also think it is a reasonable idea to include poi-ooxml-schemas in the dist target and be
the default provider of 
ooxml xml beans. This means that the "big" ooxml-schemas-1.0.jar is used only for development.
Normal POI releases as 
well as Maven POMs will use the "lite" jar.

Below is current structure of a POI release bundle:

#lib is unchanged
lib/commons-logging-1.1.jar
lib/junit-3.8.1.jar
lib/log4j-1.2.13.jar

#ooxml-schemas-1.0.jar is excluded
ooxml-lib/dom4j-1.6.1.jar
ooxml-lib/geronimo-stax-api_1.0_spec-1.0.jar
ooxml-lib/xmlbeans-2.3.0.jar

poi-3.6-beta1-20091124.jar
poi-scratchpad-3.6-beta1-20091124.jar
poi-contrib-3.6-beta1-20091124.jar
poi-ooxml-3.6-beta1-20091124.jar
poi-ooxml-schemas-3.6-beta1-20091124.jar  #new artifact, replaces ooxml-schemas-1.0.jar
poi-examples-3.6-beta1-20091124.jar       #new artifact, was requested in Bugzilla

For Maven this change is transparent - POM for the poi-ooxml module depends on poi-ooxml-schemas
instead of 
ooxml-schemas, this means Maven users will only need to update the version of POI from 3.5-FINAL
to 3.6, the rest will 
be handled by Maven automatically.

Yegor

> Hi Yegor,
> 
> +1
> 
> This will have affects on the website re-write.
> 
> (1) The "How to Build" page has a list of common targets. Here is what I 
> have currently:
> 
> clean -- Erase all build work products (ie. everything in the build 
> directory
> compile    -- Compiles all files from main, contrib and scratchpad
> test -- Run all unit tests from main, contrib and scratchpad (JUnit)
> jar -- Produce jar files
> docs -- Generate all documentation for the system (Apache Forrest)
> dist -- Create a distribution (JUnit and Apache Forrest)
> 
> This should always be part of the dist target. Should we add a target 
> for building a "lite" ooxml, or is this always be part of jar and test?
> 
> I think we should have a "lite" target separate from jar and test.
> 
> (2) I am reworking the home page. There is a table of components that 
> appear there.
> 
> Document -- Component -- JAR -- Maven artifactId
> OLE2 Filesystem -- POIFS -- poi-version-yyyymmdd.jar -- poi
> OLE2 Property Sets -- HPSF -- poi-version-yyyymmdd.jar -- poi
> Excel XLS -- HSSF -- poi-version-yyyymmdd.jar -- poi
> Excel XLSX -- XSSF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
> PowerPoint PPT -- HSLF -- poi-scratchpad-version-yyyymmdd.jar -- 
> poi-scratchpad
> PowerPoint PPTX -- XSLF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
> Word DOC -- HWPF -- poi-scratchpad-version-yyyymmdd.jar -- poi-scratchpad
> Word DOCX -- XWPF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
> Visio VSD -- HDGF -- poi-scratchpad-version-yyyymmdd.jar -- poi-scratchpad
> Publisher PUB -- HPBF -- poi-scratchpad-version-yyyymmdd.jar -- 
> poi-scratchpad
> Outlook MSG -- HSMF -- poi-scratchpad-version-yyyymmdd.jar -- 
> poi-scratchpad
> 
> I am missing the OOXML schemas in my list. With this new lite version I 
> need two rows.
> 
> OOXML Schemas -- OpenXML4J -- ooxml-schemas-yyyymmdd.jar -- poi-ooxml
> OOXML Lite -- OpenXML4J -- ooxml-schemas-lite-yyyymmdd.jar -- 
> poi-ooxml-lite
> 
> We will need to include poi-ooxml-version-yyyymmdd.jar in the 
> poi-ooxml-lite target as well. I'll mark the XLSX, XWPF, and XSLF rows 
> appropriately.
> 
> Correct?
> 
> (3) I 'll rewrite your description as a new page within the currently 
> very sparse. OOXML documentation.
> 
> BTW - the www.openxml4j.org domain has gone away and I am going to need 
> help from you in deciding additional documentation and OPC examples that 
> we should include for the OOXML sub-project.
> 
> Regards,
> Dave
> 
> On Nov 16, 2009, at 8:53 AM, Yegor Kozlov wrote:
> 
>> Hi All,
>>
>> As we discussed at Apachecon, one way to optimize the size of POI 
>> distributions is to create a 'lite' version of the ooxml-schemas jar.
>> The idea is simple: remove all unused classes and resources from the 
>> jar generated by XMLBeans. Rough estimations made at the Barcamp 
>> showed that POI uses less than 30% of the OOXML schemas, hence the 
>> optimized jar should be significantly smaller.
>>
>> With this in mind I created a simple utility called OOXMLLite, see 
>> http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java

>>
>>
>> The process includes four simple steps:
>>
>> - run all ooxml unit tests
>> - see what classes from the ooxml-schemas.jar are loaded in the JVM
>> - copy the loaded classes into some directory.
>> - copy the binary resources (.xsb)
>>
>> A good acceptance test is to run the ooxml unit tests against the 
>> 'lite' classes - all should pass. There is an accompanying Ant task 
>> ooxml-xsds-lite for that, see build.xml.
>>
>> The resulting 'lite' jar is much smaller: 
>> ooxml-schemas-lite-3.6-beta1.jar is only 3.5 MB while the 'big' 
>> ooxml-schemas-1.0.jar is 14.5 MB. In theory, the size can be trimmed 
>> down below 3 MB  - my utility copies all .xsb files and does not yet 
>> track resource dependencies.
>>
>> I propose to include ooxml-schemas-lite in the release cycle. The 
>> artifact name is ooxml-schemas-lite-${version.id}.jar.
>> Interested projects (first of all I mean Apache Tika) can setup their 
>> Maven poms to use <artifactId>poi-ooxml-lite</artifactId>  instead of

>> <artifactId>poi-ooxml</artifactId>. This will reduce the distribution

>> size by approximately 10 MB.
>>
>> Yegor
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> For additional commands, e-mail: dev-help@poi.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message