axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Sosnoski <...@sosnoski.com>
Subject Re: [Axis2] Binary Serialisation
Date Fri, 29 Jul 2005 04:56:00 GMT
Looks like we've got a thread going, Eran!

Dan, I don't think anyone has done a performance analysis for a typed 
parser as such. It'd really need to be done in the context of some sort 
of data binding framework to be meaningful. The only thing which has 
been done along these lines that I'm aware of is Sun's "FAST Web 
Services", which merged mutant forms of JAXB and JAX-RPC so that they 
could do binary input/output. In their case they used ASN.1 
encoding/decoding of the binary data, with the ASN.1 representation 
generated from an XML Schema.

They saw much faster performance than the conventional JAX-RPC code. 
But, my own JibxSoap (a subproject of JiBX, http://www.jibx.org) 
delivers performance that appears to be about as good while still using 
standard text XML. I say "appears to be" because at the time I did the 
web services performance comparisons 
(http://www.sosnoski.com/presents/cleansoap/comparing.html) the Sun 
stuff was all proprietary. They've since opened it up on java.net, I 
think, though I don't know what kind of license restrictions might apply.

My own gut feeling is that if I used a typed parser interface for binary 
input/output with JiBX/JibxSoap I could probably get 2-2.5 X the 
processing speed of text (vs. probably about 1.4-1.8 X with my XBIS 
binary XML format, which still keeps values as text and can be 
translated to and from the text representation).

There are actually some other areas where parser usability could be 
improved, though, besides implementing a typed interface. I think 
implementing a parser that supplied element and attribute names as 
singleton QName objects of some form (rather than separate namespace 
URI, local name, and qualified name text values) would be a big gain, 
for instance. The text APIs could also be better designed; in the case 
of the StAX XMLReader, rather than returning an array plus start offset 
plus length for element content, all using separate method calls, it'd 
be cleaner to just return the equivalent of a JDK 1.5 CharSequence 
(which could be reusable). Likewise on the attribute values, where StAX 
returns Strings. Returning CharSequence-equivalents would not only avoid 
unnecessary String creation (in the case of attribute values), it would 
also eliminate the need to translate the raw byte stream to character 
arrays for common encodings (especially the UTF-8 and UTF-16 used in 
BP-compliant web services).

Unfortunately, I think developers sometimes misapply Knuth's (or Hoare's 
- I'm not sure who got this started) "premature optimization is the root 
of all evil" aphorism by designing APIs without any thought to 
performance. Once performance bottlenecks have been built into the APIs 
it's very difficult to get around them without scrapping things and 
starting over.

  - Dennis

Dan Diephouse wrote:

> Has anyone done any performance tests (binary or just plan text) with 
> the typed stax stuff? Does it really make a difference?
> - Dan
>
> Eran Chinthaka wrote:
>
>> Hi Dennis,
>>
>> You have commented on typed pull parser in wiki. Shall we start a thread
>> about it here ?
>>
>> -- EC
>>
>>  
>>
>>> -----Original Message-----
>>> From: Apache Wiki [mailto:wikidiffs@apache.org]
>>> Sent: Thursday, July 28, 2005 10:31 PM
>>> To: general@ws.apache.org
>>> Subject: [Ws Wiki] Update of 
>>> "FrontPage/Axis2/Tasks/BinarySerialization"
>>> by DennisSosnoski
>>>
>>> Dear Wiki user,
>>>
>>> You have subscribed to a wiki page or wiki category on "Ws Wiki" for
>>> change notification.
>>>
>>> The following page has been changed by DennisSosnoski:
>>> http://wiki.apache.org/ws/FrontPage/Axis2/Tasks/BinarySerialization
>>>
>>> -------------------------------------------------------------------------- 
>>>
>>> ----
>>>  decoding the binary into an int, converting to a string for the parser
>>>  API and then back to an int in the deserialisation code.
>>>
>>> + I (DennisSosnoski) would personally disagree with the above 
>>> assessment.
>>> A typed pull parser would definitely be nice, but even without this you
>>> can get substantial size and performance gains from a binary format. 
>>> See
>>> my articles on devWorks at http://www-
>>> 128.ibm.com/developerworks/xml/library/x-trans1.html and http://www-
>>> 128.ibm.com/developerworks/xml/library/x-trans2/index.html for 
>>> examples.
>>> +
>>>   
>>
>>
>>
>>
>>  
>>
>
>

Mime
View raw message