cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hank Heidt" <>
Subject RE: ms word xml and embedded images
Date Thu, 21 Apr 2005 19:10:14 GMT

Are you sure that the embedded equation image is being stored as a
base64 encoded metafile?

I believe that Word 2003 XML instead serializes images (or at least most
images) as base64 encoded pngs. 

For a document reporting application, I can successfully extract Word
images and convert them to jpg's by using the extractor transformer and
SVG serializer. I have not tried this with embedded equation images.

In the extractor pipeline I convert an extracted Word <pict> element to
an SVG element buy using the following XSL:

<xsl:template match="/">
<xsl:variable name="style" select="descendant::v:shape/@style"/>
<xsl:variable name="width"
select="substring-before(substring-after($style, 'width:'),';')"/>
<xsl:variable name="height" select="substring-after($style,

<svg xmlns="">
<xsl:attribute name="width"><xsl:value-of
<xsl:attribute name="height"><xsl:value-of
<title>Embedded Word 2003 Image</title>

<image xmlns:xlink="">
<xsl:attribute name="width"><xsl:value-of
<xsl:attribute name="height"><xsl:value-of
<xsl:attribute name="xlink:href">data:image/png;base64,<xsl:value-of


The created SVG element contains the original Word base64 "w:binData"
string as an href attribute of a <svg:g
xlink:href=" STRING GOES HERE..."> tag.


-----Original Message-----
From: Stavros Kounis [] 
Sent: Thursday, April 21, 2005 9:15 AM
Subject: ms word xml and embedded images

hi all

i want to publish xml files that are produced from microsoft's word
(save as xml)

the problem i have to solve is related with embedded equation images

when the document have equation symbol, the produced xml has a base64
string that
contains a gziped windows metafile. the way to get this windows metafile

1. base64 decoding of the string
2. gunzip the result (1)

did you know if anyone has allready do some work on this way?

if not WDYT is the best approach?

i'm thinking about a custom generator that parse the .xml doc for
embended images string
and save them (here of course i don't know if i'm able to convert
widnows metafile to .png or gif) to disk ?

i will appreciate any hint



To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message