poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yegor Kozlov <ye...@dinom.ru>
Subject Re: a 'lite' version of ooxml-schemas jar
Date Tue, 17 Nov 2009 08:30:52 GMT
Hi Dave,

> 
> I've been digging deeper into the dependencies in maven. I think that 
> "lite" should become the usual way to build.
> 
> (1) maven/poi-ooxml.pom is missing two dependencies:
> 
> xmlbeans-2.3.0.jar
>   <property name="ooxml.xmlbeans.jar" 
> location="${ooxml.lib}/xmlbeans-2.3.0.jar"/>
>   <property name="ooxml.xmlbeans.url" 
> value="${repository.m2}/maven2/org/apache/xmlbeans/xmlbeans/2.3.0/xmlbeans-2.3.0.jar"/>

> 
> 

it is a chained dependency, xmlbeans-2.3.0.jar is included in ooxml-schemas.pom.

> geronimo-stax-api_1.0_spec-1.0.jar
>   <property name="ooxml.jsr173.jar" 
> location="${ooxml.lib}/geronimo-stax-api_1.0_spec-1.0.jar"/>
>   <property name="ooxml.jsr173.url" 
> value="${repository.m2}/maven2/org/apache/geronimo/specs/geronimo-stax-api_1.0_spec/1.0/geronimo-stax-api_1.0_spec-1.0.jar"/>

> 
> 
 > Is there a reason we let these out of the pom?
we need geronimo-stax only to build ooxml-schemas.jar, it is not needed at runtime. Maven
users don't need it either.

> 
>>> I propose to include ooxml-schemas-lite in the release cycle. The 
>>> artifact name is ooxml-schemas-lite-${version.id}.jar.
>>> Interested projects (first of all I mean Apache Tika) can setup their 
>>> Maven poms to use <artifactId>poi-ooxml-lite</artifactId>  instead
of 
>>> <artifactId>poi-ooxml</artifactId>. This will reduce the distribution

>>> size by approximately 10 MB.
> 
> (2) You propose a new artifact-id of ooxml-schemas-lite. I think a name 
> like ooxml-poi, poi-ooxml-schemas, or poi-opc would be better.
> 


> There are a few points to make here:
> 
> - ooxml-schemas has a different versioning - it is version 1.0. It 
> should not change much. We should have a documented build target for this.
> 
> - ooxml lite - should follow the poi versioning schema since newer 
> versions of POI will cover more of the schema. So, it is not really 
> quite a sub of ooxml-schema as much as it is a cross reference between 
> ooxml-schema and poi-ooxml.
> 
> Which version should poi-ooxml use "lite" or ooxml-schemas? I think we 
> should always use "lite" and distribute lite. We can put the "lite" 
> classes in one of two places:
> 

My plan was to distribute 'lite' only as a supplemental jar. Also, I will consider switching
the project to use "lite" 
only when I have feedback from users.


> (a) In the poi-ooxml jar as part of that build.
> (b) In its own jar under a new maven artifact-id. I like ooxml-poi
> 
> I think (b) is better, but if a user is working on ooxml support in 
> poi-ooxml then they it is likely that they will be covering parts of the 
> schema not yet covered by "lite"
> 
> Users will still want to work with the full schemas they need to make a 
> choice when they build - either with a special target or by copying the 
> big jar in ooxml-lib/
> 

Development builds should always use the "full" jar and /ooxml-lib should contain ooxml-schemas-1.0.jar
and no other 
versions of ooxml-schemas. Actually it is the only way it can work - the "lite" jar is a derivative
from the "full" jar, 
it is not an alternative.

> In general users will want to use the "lite" jar. We can provide access 
> to the full ooxml-schema as a replacement. Is it possible to have 
> "selective" targets in a maven pom? Can we make poi-ooxml dependent on 
> either "ooxml-poi" or "ooxml-schema"?
> 
> For the build I think that an explicit target should be used called 
> "ooxml" - this will perform your full task and make sure that the build 
> environment is using "lite" and not "full". I suspect that this target 
> may move some files around. We'll need to explain that adding support 
> for parts of the schema means adding unit tests. These unit test should 
> help us with documentation on the OOXML formats.
> 

>> (2) I am reworking the home page. There is a table of components that 
>> appear there.
>>
>> Document -- Component -- JAR -- Maven artifactId
>> OLE2 Filesystem -- POIFS -- poi-version-yyyymmdd.jar -- poi
>> OLE2 Property Sets -- HPSF -- poi-version-yyyymmdd.jar -- poi
>> Excel XLS -- HSSF -- poi-version-yyyymmdd.jar -- poi
>> Excel XLSX -- XSSF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
>> PowerPoint PPT -- HSLF -- poi-scratchpad-version-yyyymmdd.jar -- 
>> poi-scratchpad
>> PowerPoint PPTX -- XSLF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
>> Word DOC -- HWPF -- poi-scratchpad-version-yyyymmdd.jar -- poi-scratchpad
>> Word DOCX -- XWPF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
>> Visio VSD -- HDGF -- poi-scratchpad-version-yyyymmdd.jar -- 
>> poi-scratchpad
>> Publisher PUB -- HPBF -- poi-scratchpad-version-yyyymmdd.jar -- 
>> poi-scratchpad
>> Outlook MSG -- HSMF -- poi-scratchpad-version-yyyymmdd.jar -- 
>> poi-scratchpad
>>
>> I am missing the OOXML schemas in my list. With this new lite version 
>> I need two rows.
>>


>> OOXML Schemas -- OpenXML4J -- ooxml-schemas-yyyymmdd.jar -- poi-ooxml
>> OOXML Lite -- OpenXML4J -- ooxml-schemas-lite-yyyymmdd.jar -- 
>> poi-ooxml-lite
>>
>> We will need to include poi-ooxml-version-yyyymmdd.jar in the 
>> poi-ooxml-lite target as well. I'll mark the XLSX, XWPF, and XSLF rows 
>> appropriately.
>>
>> Correct?
>>
Not quite.
OpenXML4J  is a general-purpose API to work with OPC packages, it is a direct counterpart
of POIFS. So, it should stay 
separate.

As to OOXML Schemas, I would rather not advertise them on the web site - it is a detail of
our internal implementation. 
Users are advised to use common interfaces.


Yegor

>> (3) I 'll rewrite your description as a new page within the currently 
>> very sparse. OOXML documentation.
>>
>> BTW - the www.openxml4j.org domain has gone away and I am going to 
>> need help from you in deciding additional documentation and OPC 
>> examples that we should include for the OOXML sub-project.
>>
>> Regards,
>> Dave
>>
>> On Nov 16, 2009, at 8:53 AM, Yegor Kozlov wrote:
>>
>>> Hi All,
>>>
>>> As we discussed at Apachecon, one way to optimize the size of POI 
>>> distributions is to create a 'lite' version of the ooxml-schemas jar.
>>> The idea is simple: remove all unused classes and resources from the 
>>> jar generated by XMLBeans. Rough estimations made at the Barcamp 
>>> showed that POI uses less than 30% of the OOXML schemas, hence the 
>>> optimized jar should be significantly smaller.
>>>
>>> With this in mind I created a simple utility called OOXMLLite, see 
>>> http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java

>>>
>>>
>>> The process includes four simple steps:
>>>
>>> - run all ooxml unit tests
>>> - see what classes from the ooxml-schemas.jar are loaded in the JVM
>>> - copy the loaded classes into some directory.
>>> - copy the binary resources (.xsb)
>>>
>>> A good acceptance test is to run the ooxml unit tests against the 
>>> 'lite' classes - all should pass. There is an accompanying Ant task 
>>> ooxml-xsds-lite for that, see build.xml.
>>>
>>> The resulting 'lite' jar is much smaller: 
>>> ooxml-schemas-lite-3.6-beta1.jar is only 3.5 MB while the 'big' 
>>> ooxml-schemas-1.0.jar is 14.5 MB. In theory, the size can be trimmed 
>>> down below 3 MB  - my utility copies all .xsb files and does not yet 
>>> track resource dependencies.
>>>
>>> I propose to include ooxml-schemas-lite in the release cycle. The 
>>> artifact name is ooxml-schemas-lite-${version.id}.jar.
>>> Interested projects (first of all I mean Apache Tika) can setup their 
>>> Maven poms to use <artifactId>poi-ooxml-lite</artifactId>  instead
of 
>>> <artifactId>poi-ooxml</artifactId>. This will reduce the distribution

>>> size by approximately 10 MB.
>>>
>>> Yegor
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: dev-help@poi.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> For additional commands, e-mail: dev-help@poi.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message