ws-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus M. Salvo Jr." <jesus.sa...@migasia.com>
Subject Bug: Entity references incorrectly sent
Date Wed, 12 Nov 2003 02:43:57 GMT

Assuming I have the following code, where I try to send pound symbol:

     Vector    params = new Vector();
     Hashtable hashParams = new Hashtable();
     hashParams.put( "msg", "&#163;" );
     params.add( hashParams );

     client.execute( "XMLRPCHandler.getContent", params );

Problem is that, the XML-RPC API sends
    &#163;
as
    &amp;#163;

Of course, this problem will happen to any entity-encoded character, not 
just the pound symbol.
The root of the problem is the org.apache.xml.XmlWriter.chardata() 
method, which does the following:

    /**
     * Writes text as <code>PCDATA</code>.
     *
     * @param text The data to write.
     * @exception XmlRpcException Unsupported character data found.
     * @exception IOException Problem writing data.
     */
    protected void chardata(String text)
        throws XmlRpcException, IOException
    {
        int l = text.length ();
        for (int i = 0; i < l; i++)
        {
            char c = text.charAt (i);
            switch (c)
            {
            case '\t':
            case '\r':
            case '\n':
                write(c);
                break;
            case '<':
                write(LESS_THAN_ENTITY);
                break;
            case '>':
                write(GREATER_THAN_ENTITY);
                break;
            case '&':
                write(AMPERSAND_ENTITY);
                break;
            default:
                if (c < 0x20 || c > 0xff)
                {
                    // Though the XML-RPC spec allows any ASCII
                    // characters except '<' and '&', the XML spec
                    // does not allow this range of characters,
                    // resulting in a parse error from most XML
                    // parsers.
                    throw new XmlRpcException(0, "Invalid character data " +
                                              "corresponding to XML 
entity &#" +
                                              String.valueOf((int) c) + 
';');
                }
                else
                {
                    write(c);
                }
            }
        }
    }


What should happen is that it should follow the same logic / code as in 
Apache Xerces.
In org.apache.xml.serialize.BaseMarkUpSerializer:

    /**
     * Escapes a string so it may be printed as text content or attribute
     * value. Non printable characters are escaped using character 
references.
     * Where the format specifies a deault entity reference, that reference
     * is used (e.g. <tt>&amp;lt;</tt>).
     *
     * @param source The string to escape
     */
    protected void printEscaped( String source )
        throws IOException
    {
        for ( int i = 0 ; i < source.length() ; ++i ) {
            int ch = source.charAt(i);
            if ((ch & 0xfc00) == 0xd800 && i+1 < source.length()) {
                int lowch = source.charAt(i+1);
                if ((lowch & 0xfc00) == 0xdc00) {
                    ch = 0x10000 + ((ch-0xd800)<<10) + lowch-0xdc00;
                    i++;
                }
            }
            printEscaped(ch);
        }
    }

    protected void printEscaped( int ch )
        throws IOException
    {
        String charRef;
        // If there is a suitable entity reference for this
        // character, print it. The list of available entity
        // references is almost but not identical between
        // XML and HTML.
        charRef = getEntityRef( ch );
        if ( charRef != null ) {
            _printer.printText( '&' );
            _printer.printText( charRef );
            _printer.printText( ';' );
        } else if ( ( ch >= ' ' && _encodingInfo.isPrintable((char)ch) 
&& ch != 0xF7 ) ||
                    ch == '\n' || ch == '\r' || ch == '\t' ) {
            // Non printables are below ASCII space but not tab or line
            // terminator, ASCII delete, or above a certain Unicode 
threshold.
            if (ch < 0x10000) {
                _printer.printText((char)ch );
            } else {
                _printer.printText((char)(((ch-0x10000)>>10)+0xd800));
                _printer.printText((char)(((ch-0x10000)&0x3ff)+0xdc00));
            }
        } else {
            // The character is not printable, print as character reference.
            _printer.printText( "&#x" );
            _printer.printText(Integer.toHexString(ch));
            _printer.printText( ';' );
        }
    }


-- 
Jesus M. Salvo Jr.
Mobile Internet Group Pty Ltd
(formerly Softgame International Pty Ltd)
M: +61 409 126699
T: +61 2 94604777
F: +61 2 94603677

PGP Public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC0BA5348




Mime
View raw message