axis-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Minifie, Todd" <todd.mini...@cgi.com>
Subject Take me off of the mailing list PLEASE.
Date Fri, 08 May 2009 20:20:30 GMT


-----Original Message-----
From: David K. Taylor (JIRA) [mailto:jira@apache.org] 
Sent: Friday, May 08, 2009 11:28 AM
To: axis-c-dev@ws.apache.org
Subject: [jira] Updated: (AXIS2C-1265) guththila does not support Chinese and the Japanese.


     [ https://issues.apache.org/jira/browse/AXIS2C-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David K. Taylor updated AXIS2C-1265:
------------------------------------

    Attachment: utf8-patch.txt

This patch provides UTF-8 support when reading SOAP messages through Guththila.  Since libiconv
is optional and not required, I hand coded a UTF-8 transcoder, though since I don't use libiconv
I didn't add optional code to use it.  That would be a good addition.

This patch was built successfully on the official 1.6.0 release.  It also includes unit tests
under guththila/tests for the new transcoder (both decode and encode, though only decode is
really used).  To run these tests, since they are not executed as part of the regular "make
check" target, use these commands:

cd guththila/tests
./s
./reader

The decoder test takes a few minutes since it covers the entire Unicode code point space.

This patch does not completely solve the UTF-8 issue, but handles the most common case.  These
issues remain:

1) Still uses isspace and isalpha for XML tag names and attribute names, which depend on the
locale set in the environment.

2) Only accepts UTF-8, not other encodings.  (Using iconv could improve this as well.)

3) Ignores possible encoding set in XML declarative.

4) Ignores possible encoding set in HTTP Content-Type.

5) Only allows invalid UTF-8 bytes to be ignored.  Should have option to escape them instead.

> guththila does not support Chinese and the Japanese.
> ----------------------------------------------------
>
>                 Key: AXIS2C-1265
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-1265
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: 1.5.0
>         Environment: windows xp sp2 japan
>            Reporter: songlei
>         Attachments: utf8-patch.txt
>
>
> data:
> a.xml
> <?xml version='1.0' encoding='UTF-8'?>
> <ns:parameter xmlns:ns="urn:ns">
> <ns:unit xmlns:ns="urn:ns">
> 	<ns:name>name</ns:name>
> 	<ns:type>1</ns:type>
> 	<ns:displayname>門雷:名前</ns:displayname>
> 	<ns:value>2</ns:value>
> </ns:unit>
> </ns:parameter>
> ---------------------------------------------------------------------
> code:
> axiom_node_t *root_node = NULL;
> axiom_node_t *child = NULL;
> axiom_document_t *document = NULL;
> axiom_stax_builder_t *om_builder = NULL;
> axiom_xml_reader_t *xml_reader = NULL;
> f = fopen("a.xml","r");
> xml_reader = axiom_xml_reader_create_for_io(env, read_input_callback, close_input_callback,
NULL, "UTF-8");
> om_builder = axiom_stax_builder_create(env, xml_reader);
> document = axiom_stax_builder_get_document(om_builder, env);
> root_node = axiom_document_get_root_element(document, env);
> axiom_document_build_all(document, env);
> child = axiom_node_get_first_child(root_node, env);
> --------------------------------------------------------------------------------------------
> result:
> The analysis result is under shows:
> <ns:parameter xmlns:ns="urn:ns">
> <ns:unit xmlns:ns="urn:ns">
> 	<ns:name>name</ns:name>
> 	<ns:type>1</ns:type>
> 	<ns:displayname></ns:displayname>
> </ns:unit>
> </ns:parameter>
> displayname and value lost
> ---------------------------------------------------------------------------------------------------------------
> debug:
> .¥axis2c¥guththila¥src¥guththila_xml_parser.c
> 1532            c = m->buffer.buff[m->buffer.cur_buff][m->next++ -
> 1533 
> GUTHTHILA_BUFFER_PRE_DATA_SIZE
> 1534                                                    (m->buffer)];
> 1535            return c >= 0 ? c : -1;
> c is int.
> m->buffer.buff[m->buffer.cur_buff][m->next++ - GUTHTHILA_BUFFER_PRE_DATA_SIZE
(m->buffer)] is char.
> char scope is - 127‾128.
> char[i] char [i+1]  == 門
> char[i]  > 128
> char Convert int, c < 0
> om_builder-done = true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message