xmlgraphics-batik-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Tošovský <j.tosov...@tiscali.cz>
Subject RE: Unicode entity resolved on reading document
Date Tue, 31 Mar 2009 19:42:27 GMT
Hi Paul, 
your problem can be solved by cloaking before parsing, followed by
uncloaking. Cloaking hides DTD and modifies all entities so they are
untouched during processing. Uncloaking is reverse process. I had the same
problem with one special operation with docbook file. Script for Perl can be
found here:
I have also my own VB script variant - for Windows so there is not necessary
to install Perl there.

From: thomas.deweese@kodak.com [mailto:thomas.deweese@kodak.com] 
Sent: Tuesday, March 31, 2009 11:47 AM
To: batik-users@xmlgraphics.apache.org
Cc: batik-users@xmlgraphics.apache.org
Subject: Re: Unicode entity resolved on reading document

Hi Paul,

Paul Wellner Bou <paul@purecodes.org> wrote on 03/31/2009 02:57:17 AM:

> thomas.deweese@kodak.com wrote:
> >    I think it's better to explain why this is a problem for you.
> > As long as the text encoding is correct there shouldn't be any
> > problem with replacing the character... So why is there a problem?
> The problem is not technical in this case. It is a question of slightly 
> correcting some data in the SVG and writing it to a new file which 
> should be as similar as possible with the original file. This is 
> required as the people looking into the file to check it will compare it 
> with the original, don't have much knowledge about XML/SVG and will 
> reject it as there are modified lines which don't have to do anything 
> with the correction.

   Then you will either need to educate them or write a tool that will 
operate on the raw text stream.  You could potentially write a 
post-processing step that entified any characters that are outside of 
7bit Unicode.  It might give almost the same input... 

> So it is not possible to use an XML parser without replacing entities?

   No, even if it was Batik would fail on valid input: 
        <rect fill="&#x23;&#x46;&#x46;&#x30;&#x30;&#x30;&#x30;"

              x="0" y="0" width="200" height="200"/> 

   So it's likely not useful... 

View raw message