poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Fisher <dfis...@jmlafferty.com>
Subject Re: a 'lite' version of ooxml-schemas jar
Date Wed, 18 Nov 2009 05:03:42 GMT
Hi Yegor,

>> I've been digging deeper into the dependencies in maven. I think  
>> that "lite" should become the usual way to build.
>> (1) maven/poi-ooxml.pom is missing two dependencies:
>> xmlbeans-2.3.0.jar
>>  <property name="ooxml.xmlbeans.jar" location="${ooxml.lib}/ 
>> xmlbeans-2.3.0.jar"/>
>>  <property name="ooxml.xmlbeans.url" value="${repository.m2}/maven2/ 
>> org/apache/xmlbeans/xmlbeans/2.3.0/xmlbeans-2.3.0.jar"/>
>
> it is a chained dependency, xmlbeans-2.3.0.jar is included in ooxml- 
> schemas.pom.
>
>> geronimo-stax-api_1.0_spec-1.0.jar
>>  <property name="ooxml.jsr173.jar" location="${ooxml.lib}/geronimo- 
>> stax-api_1.0_spec-1.0.jar"/>
>>  <property name="ooxml.jsr173.url" value="${repository.m2}/maven2/ 
>> org/apache/geronimo/specs/geronimo-stax-api_1.0_spec/1.0/geronimo- 
>> stax-api_1.0_spec-1.0.jar"/>
> > Is there a reason we let these out of the pom?
> we need geronimo-stax only to build ooxml-schemas.jar, it is not  
> needed at runtime. Maven users don't need it either.

What build targets require jsr 173? Those will download geronimo-stax  
and the schemas to build ooxml-schemas.

>>>> I propose to include ooxml-schemas-lite in the release cycle. The  
>>>> artifact name is ooxml-schemas-lite-${version.id}.jar.
>>>> Interested projects (first of all I mean Apache Tika) can setup  
>>>> their Maven poms to use <artifactId>poi-ooxml-lite</artifactId>
  
>>>> instead of <artifactId>poi-ooxml</artifactId>. This will reduce
 
>>>> the distribution size by approximately 10 MB.
>> (2) You propose a new artifact-id of ooxml-schemas-lite. I think a  
>> name like ooxml-poi, poi-ooxml-schemas, or poi-opc would be better.
>
>
>> There are a few points to make here:
>> - ooxml-schemas has a different versioning - it is version 1.0. It  
>> should not change much. We should have a documented build target  
>> for this.
>> - ooxml lite - should follow the poi versioning schema since newer  
>> versions of POI will cover more of the schema. So, it is not really  
>> quite a sub of ooxml-schema as much as it is a cross reference  
>> between ooxml-schema and poi-ooxml.
>> Which version should poi-ooxml use "lite" or ooxml-schemas? I think  
>> we should always use "lite" and distribute lite. We can put the  
>> "lite" classes in one of two places:
>
> My plan was to distribute 'lite' only as a supplemental jar. Also, I  
> will consider switching the project to use "lite" only when I have  
> feedback from users.

OK. Then it will be made by a new build target. What is that target?

> (a) In the poi-ooxml jar as part of that build.
>> (b) In its own jar under a new maven artifact-id. I like ooxml-poi
>> I think (b) is better, but if a user is working on ooxml support in  
>> poi-ooxml then they it is likely that they will be covering parts  
>> of the schema not yet covered by "lite"
>> Users will still want to work with the full schemas they need to  
>> make a choice when they build - either with a special target or by  
>> copying the big jar in ooxml-lib/
>
> Development builds should always use the "full" jar and /ooxml-lib  
> should contain ooxml-schemas-1.0.jar and no other versions of ooxml- 
> schemas. Actually it is the only way it can work - the "lite" jar is  
> a derivative from the "full" jar, it is not an alternative.

OK. Lite jar is acceptable as a supplemental and I will "document" why  
it should be used.

Is there a reason for maven users to care about the "lite". If there  
is then we need to worry about alternative dependencies for ooxml - 
schemas. If not then we are good.


>> In general users will want to use the "lite" jar. We can provide  
>> access to the full ooxml-schema as a replacement. Is it possible to  
>> have "selective" targets in a maven pom? Can we make poi-ooxml  
>> dependent on either "ooxml-poi" or "ooxml-schema"?
>> For the build I think that an explicit target should be used called  
>> "ooxml" - this will perform your full task and make sure that the  
>> build environment is using "lite" and not "full". I suspect that  
>> this target may move some files around. We'll need to explain that  
>> adding support for parts of the schema means adding unit tests.  
>> These unit test should help us with documentation on the OOXML  
>> formats.
>
>>> (2) I am reworking the home page. There is a table of components  
>>> that appear there.
>>>
>>> Document -- Component -- JAR -- Maven artifactId
>>> OLE2 Filesystem -- POIFS -- poi-version-yyyymmdd.jar -- poi
>>> OLE2 Property Sets -- HPSF -- poi-version-yyyymmdd.jar -- poi
>>> Excel XLS -- HSSF -- poi-version-yyyymmdd.jar -- poi
>>> Excel XLSX -- XSSF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
>>> PowerPoint PPT -- HSLF -- poi-scratchpad-version-yyyymmdd.jar --  
>>> poi-scratchpad
>>> PowerPoint PPTX -- XSLF -- poi-ooxml-version-yyyymmdd.jar -- poi- 
>>> ooxml
>>> Word DOC -- HWPF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
>>> scratchpad
>>> Word DOCX -- XWPF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
>>> Visio VSD -- HDGF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
>>> scratchpad
>>> Publisher PUB -- HPBF -- poi-scratchpad-version-yyyymmdd.jar --  
>>> poi-scratchpad
>>> Outlook MSG -- HSMF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
>>> scratchpad
>>>
>>> I am missing the OOXML schemas in my list. With this new lite  
>>> version I need two rows.
>>>
>
>
>>> OOXML Schemas -- OpenXML4J -- ooxml-schemas-yyyymmdd.jar -- poi- 
>>> ooxml
>>> OOXML Lite -- OpenXML4J -- ooxml-schemas-lite-yyyymmdd.jar -- poi- 
>>> ooxml-lite
>>>
>>> We will need to include poi-ooxml-version-yyyymmdd.jar in the poi- 
>>> ooxml-lite target as well. I'll mark the XLSX, XWPF, and XSLF rows  
>>> appropriately.
>>>
>>> Correct?
>>>
> Not quite.
> OpenXML4J  is a general-purpose API to work with OPC packages, it is  
> a direct counterpart of POIFS. So, it should stay separate.

And that is part of poi-ooxml. Correct?

> As to OOXML Schemas, I would rather not advertise them on the web  
> site - it is a detail of our internal implementation. Users are  
> advised to use common interfaces.

I will document them on the website because they are dependencies of  
the maven poms.

I will say that one would not normally want to build the ooxml- 
schemas. However I could see people wanting to understand what  
changing the schema might mean.

However with a lite version I can see a developer of new ooxml  
features wanting to test their coverage and write unit tests.

Regards,
Dave



>
>
> Yegor
>
>>> (3) I 'll rewrite your description as a new page within the  
>>> currently very sparse. OOXML documentation.
>>>
>>> BTW - the www.openxml4j.org domain has gone away and I am going to  
>>> need help from you in deciding additional documentation and OPC  
>>> examples that we should include for the OOXML sub-project.
>>>
>>> Regards,
>>> Dave
>>>
>>> On Nov 16, 2009, at 8:53 AM, Yegor Kozlov wrote:
>>>
>>>> Hi All,
>>>>
>>>> As we discussed at Apachecon, one way to optimize the size of POI  
>>>> distributions is to create a 'lite' version of the ooxml-schemas  
>>>> jar.
>>>> The idea is simple: remove all unused classes and resources from  
>>>> the jar generated by XMLBeans. Rough estimations made at the  
>>>> Barcamp showed that POI uses less than 30% of the OOXML schemas,  
>>>> hence the optimized jar should be significantly smaller.
>>>>
>>>> With this in mind I created a simple utility called OOXMLLite,  
>>>> see http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java
>>>>
>>>> The process includes four simple steps:
>>>>
>>>> - run all ooxml unit tests
>>>> - see what classes from the ooxml-schemas.jar are loaded in the JVM
>>>> - copy the loaded classes into some directory.
>>>> - copy the binary resources (.xsb)
>>>>
>>>> A good acceptance test is to run the ooxml unit tests against the  
>>>> 'lite' classes - all should pass. There is an accompanying Ant  
>>>> task ooxml-xsds-lite for that, see build.xml.
>>>>
>>>> The resulting 'lite' jar is much smaller: ooxml-schemas-lite-3.6- 
>>>> beta1.jar is only 3.5 MB while the 'big' ooxml-schemas-1.0.jar is  
>>>> 14.5 MB. In theory, the size can be trimmed down below 3 MB  - my  
>>>> utility copies all .xsb files and does not yet track resource  
>>>> dependencies.
>>>>
>>>> I propose to include ooxml-schemas-lite in the release cycle. The  
>>>> artifact name is ooxml-schemas-lite-${version.id}.jar.
>>>> Interested projects (first of all I mean Apache Tika) can setup  
>>>> their Maven poms to use <artifactId>poi-ooxml-lite</artifactId>
  
>>>> instead of <artifactId>poi-ooxml</artifactId>. This will reduce
 
>>>> the distribution size by approximately 10 MB.
>>>>
>>>> Yegor
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>>>> For additional commands, e-mail: dev-help@poi.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: dev-help@poi.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> For additional commands, e-mail: dev-help@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message