ode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthick Sankarachary (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ODE-472) utf-8 encoding is handled incorrectly within xslt stylesheets
Date Tue, 13 Jan 2009 20:49:01 GMT

    [ https://issues.apache.org/jira/browse/ODE-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663469#action_12663469
] 

Karthick Sankarachary commented on ODE-472:
-------------------------------------------

Alexey,

I think you hit the nail on the head. The bottomline is that when you construct the StreamSource
class, you should either use a stream, in which case the XML parser will resolve the XML character
encoding for you, or a reader, in which case the character encoding must have been already
resolved. 

Looking at your patch, if the body of the style sheet is already initialized, as is usually
the case, then odds are that we won't be using the right character encoding. As you pointed
out, we need to fix the BpelCompiler.loadXsltSheet(URI) method, so that it is aware of XML
encoding declarations. 

The trick is to auto-detect the XML character encoding as described in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info.
You might want to consider using org.apache.xmlbeans.impl.common.XmlEncodingSniffer or some
helper class like that which does the grunt work for you.

Regards,
Karthick

> utf-8 encoding is handled incorrectly within xslt stylesheets
> -------------------------------------------------------------
>
>                 Key: ODE-472
>                 URL: https://issues.apache.org/jira/browse/ODE-472
>             Project: ODE
>          Issue Type: Bug
>          Components: BPEL Runtime
>    Affects Versions: 1.2
>            Reporter: Alexey Ousov
>         Attachments: ODE-472-quickfix.patch, ODE-472.patch, test1.par.zip
>
>
> The bug occurs when UTF-8 encoded symbols appear either within stylesheet itself or inside
documents referenced with document() function. All such symbols are encoded twice.
> So if we have in xslt something like:
> <xsl:value-of select="&#00e0;" />
> which is UTF-8 encoded as "C3 A0" in result node we will have sequence "C3 83 C2 A0"
which is UTF-8 encoded "&#00c3;&#00a0;".
> The case of bug is XslRuntimeUriResolver class, which reads files to string without parsing
file encoding. I made quick fix, which fixes only document() function with xpath 1.0 runtime.
Deeper investigation is needed, so hopefully full fix will be available after New Year.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message