commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Garcia (JIRA)" <j...@apache.org>
Subject [jira] Created: (LANG-617) StringEscapeUtils.escapeXML() can't process UTF-16 supplementary characters
Date Tue, 13 Apr 2010 15:25:49 GMT
StringEscapeUtils.escapeXML() can't process UTF-16 supplementary characters
---------------------------------------------------------------------------

                 Key: LANG-617
                 URL: https://issues.apache.org/jira/browse/LANG-617
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 2.4
            Reporter: David Garcia
            Priority: Minor


Supplementary characters in UTF-16 are those whose code points are above 0xffff, that is,
require more than 1 Java char to be encoded, as explained here: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

Currently, StringEscapeUtils.escapeXML() isn't aware of this coding scheme and treats each
char as one character, which is not always right.

A possible solution in class Entities would be:

    public void escape(Writer writer, String str) throws IOException {
        int len = str.length();
        for (int i = 0; i < len; i++) {
            int code = str.codePointAt(i);
            String entityName = this.entityName(code);
            if (entityName != null) {
                writer.write('&');
                writer.write(entityName);
                writer.write(';');
            } else if (code > 0x7F) {
                    writer.write("&#");
                    writer.write(code);
                    writer.write(';');
            } else {
                    writer.write((char) code);
            }

            if (code > 0xffff) {
                    i++;
            }
        }
    }

Besides fixing escapeXML(), this will also affect HTML escaping functions. I guess that's
a good thing, but please remember I have only tested escapeXML().


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message