poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 47668] New: OOXML is parsed as tree, but PPTX is a graph
Date Mon, 10 Aug 2009 06:36:01 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=47668

           Summary: OOXML is parsed as tree, but PPTX is a graph
           Product: POI
           Version: 3.5-dev
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: critical
          Priority: P2
         Component: POI Overall
        AssignedTo: dev@poi.apache.org
        ReportedBy: stefan.stern@mind8.com


--- Comment #0 from Stefan Stern <stefan.stern@mind8.com> 2009-08-09 23:35:58 PDT ---
The 'POIXMLDocumentPart' and 'POIXMLDocument' parse an OOXML document by
seeking the main part of the OOXML, represented by an instance of a subclass of
'POIXMLDocument', and then invoke the method "read(POIXMLFactory factory)"
recursively on all relations to other PackageParts. 

This works fine in Excel and Word files, as these seem to be trees to a far
extent. In PowerPoint, the Slide, SlideLayout and SlideMaster form a graph. The
Slides have a relationship to the SlideLayout. SlideLayout has a relationship
to the SlideMaster. SlideMaster has a relationship to all SlideLayouts. And the
presentation has relations to all Slides and SlideMaster classes. 

When using the existing classes in my current XSLF-Implementation, I end up in
an endless loop. The only option to avoid this, is to pass a context object to
the loading classes, where all loaded PackagePart and their corresponding
XSLF-classes are chached. This allows to avoid any loops and every
POIXML-instance can be linked with its related parts. 

Storing is analogue, although a Set is sufficient to prevent endless loops.
When storing the document, a Set is passed as context. When invoking all
related parts recursively, only those not yet referenced by the Set are stored. 

The change in this behavior implies additional changes: Images used in multiple
places refernce from several places to the same PackagePart. There is no 1:1
mapping from appearance in the document and existence of PackageParts.
Currently, every relation ends up in a separate instance each time an image is
referenced, but all images point to the same PackagePart. This may cause weired
behavior when saving the document. 

Example: You have an Excel xlsx-file. This contains two sheets. On each sheet,
you have the same JPG image inserted. If you look into the file, you will see
that both sheet-object refer to the very same image-PackagePart. But in POI,
these references end up in two instances of XSSFPicture. But these two
instances refer to the same PackagePart and yet represent two different objects
in Excel. In case one of the images is altered, depending on the store-sequence
you may encounter an effect to the other image as well or your modification is
not persisted at all. 

See the patch attached to this bug to enable a 1:1 mapping between PackageParts
and POIXMLDocumentPart-objects. I will try to come up with a test-case for the
Excel example.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message