synapse-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Veithen <andreas.veit...@skynet.be>
Subject Resolving SYNAPSE-218
Date Sun, 09 Mar 2008 01:01:26 GMT
Hi all!

In order to resolve SYNAPSE-218 (TextFileDataSource violates  
OMDataSource contract), I think we need to review the following piece  
of code in VFSTransportSender#populateResponseFile (there is similar  
code in MailTransportSender#sendMail):

if (firstChild instanceof OMSourcedElementImpl) {
     firstChild.serializeAndConsume(os);
} else {
     os.write(firstChild.getText().getBytes());
}

The purpose of this piece of code is to write the content of {http://ws.apache.org/commons/ns/payload

}text elements to the response file. OMSourcedElementImpl nodes are  
handled differently to ensure that large output from XSL  
transformations is processed efficiently (i.e. without loading the  
entire temporary output file into memory). The code has two problems:

1) Since OMSourcedElementImpl extends OMElement, a call to  
serializeAndConsume should normally write out the entire element, i.e.  
start tag, content (encoded as XML) and end tag. This is of course not  
what is indented here. The code only works as expected because  
TextFileDataSource doesn't respect the OMDataSource contract (which is  
what SYNAPSE-218 is all about).

2) The instruction os.write(firstChild.getText().getBytes()) will  
encode the content of the element using the default platform encoding,  
which is not always what is expected. Note that for the output of an  
XSL transformation, the content of the element is produced by the  
following instruction in XSLTMediator#performXSLT:

handleNonXMLResult(baosForTarget.toString(), traceOrDebugOn, traceOn)

While considered separately this is also incorrect (since  
ByteArrayOutputStream#toString uses the default platform encoding,  
while the encoding of the stream depends on the stylesheet), in most  
cases the net result is indeed that the response file will have the  
encoding specified in the stylesheet. However this only works if the  
combined transformation ByteArrayOutputStream#toString ->  
String#getBytes is equivalent to the identity transformation. This is  
not the case if the default platform encoding is e.g. UTF-8.

The solution for the first problem is actually surprisingly simple.  
The correct behavior can be achieved by replacing the code by the  
following instructions:

OMNode node = firstChild.getFirstOMChild();
while (node != null) {
     if (node instanceof OMText) {
         os.write(((OMText)node).getText().getBytes());
     }
     node = node.getNextOMSibling();
}

I checked that for an OMSourcesElementImpl node backed by a  
TextFileDataSource object, getFirstOMChild and getNextOMSibling will  
read a single chunk of text from the WrappedTextNodeStreamReader  
constructed by TextFileDataSource. Therefore the replacement code will  
handle large temporary files with the same efficiency as the original  
code.

An obviously solution for the second problem is to allow the  
configuration of the output encoding in the VFS transport (and to  
correct XSLTMediator!). However there might be cases where the user  
wants to specify the output encoding in the XSLT stylesheet. This  
could be achieved by allowing XSLTMediator to be configured to use a  
binary wrapper instead of a text wrapper for text output. In this case  
Synapse would strictly preserve the output of the XSL transformation.

I'm waiting for your comments!

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


Mime
View raw message