quetz-mod_python-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Dumpleton <grah...@dscpl.com.au>
Subject Re: mod_python, unicode, utf-8, latin1
Date Fri, 11 Aug 2006 23:05:50 GMT
For future reference, a general question like this is better posted  
to the
mod_python user mailing list and not the developer mailing list as it  
isn't
related to internal development of mod_python. There are also a lot more
people on the user mailing list with much more diverse knowledge and
thus you might get a quicker/better answer on the user mailing list.

Anyway, lets see if anyone comes up with anything on the developer
mailing list, but if you don't get an answer in a day or so, you  
might instead
post it to the more general user mailing list.

The user mailing list is the one mentioned on the mod_python home page.

BTW, changing default character encoding in Python site.py is I  
believe not
generally seen as a good idea. It would also help in future if you  
specify which
version of mod_python you are using and in the case of PSP whether  
you are
triggering PSP direct with mod_python.psp as the handler or whether  
you are
manually using PSP objects from a mod_python.publisher handler.

Except for those comments I am not a Unicode person so don't know the  
ins
and outs of using Unicode with mod_python.

Graham

On 12/08/2006, at 8:33 AM, Earle Ady wrote:

> Aloha!
>
> I've done some searching online regarding character encoding and  
> UTF-8 support within mod_python, but haven't been able to get the  
> proper functionality out of mod_python.
>
> Here's the situation:  I have changed my site.py in Python 2.4.3 to  
> use "utf-8" as the default encoding.   I have a database with  
> correct unicode representations in it.  I execute routines from the  
> interpreter and get correct unicode objects out of the database.   
> When I run these exact routines from inside of a PSP page, the  
> unicode object has now been latin1 decoded.  Please note that from  
> the examples below that I am using identical MySQLdb connection  
> settings.
>
> I am still a bit unclear as to where exactly this is happening  
> inside of mod_python, and any advice to a solution would be greatly  
> appreciated.  It's pretty critical that a developer can provide  
> UTF-8 support in order for mod_python to gain traction in  
> enterprise applications.
>
> If this is a user error on my part, I'd greatly appreciate being  
> pointed to a proper solution.
>
> Best,
> earle.
>
> ------  THIS WORKS FROM WITHIN THE INTERPRETER:
> (conn, cursor) = util.DBConnect(MySQLdb.cursors.DictCursor)
>
> cursor.execute("SELECT * from unicode_test")
> items = cursor.fetchall()
>
> for item in items:
>         print item,
>
> # RESULTS:  correct unicode:
> # (earle@www-1 14:55 266) python utest.py
> # {'data': u'\u9577\u5ca1', 'id': 35L}
> # {'data': u'\u9577\u5ca1', 'id': 36L}
>
>
> -------  THIS DOES NOT WORK FROM .PSP, it produces a latin1 decoded  
> unicode object of the correct unicode (see below):
>
> <%
> req.content_type = 'text/html;charset=UTF-8;';
> %>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  
> "http://www.w3.org/TR/xhtml1/DTD/xhtm
> l1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en">
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
> </head>
> <body>
> <%@ include file="include/webglobals.psps" %>
> <%
> (conn, cursor) = util.DBConnect(MySQLdb.cursors.DictCursor)
> req.write("MYSQL CONNECTION CHARSET: ")
> req.write(conn.character_set_name())
> req.write("<p/>")
> req.write("SYS.DEFAULTENCODING: ")
> req.write(sys.getdefaultencoding())
> req.write("<p/>")
>
> res = cursor.execute("SELECT * from unicode_test")
> items = cursor.fetchall()
>
> for i in items:
>         #
>         req.write("DATA: ")
>         req.write(i['data'])
>         req.write(", item: ")
> %>
> <%= i %>
> <%
>         req.write(",  BYTES: ")
>         req.write(i['data'].encode('unicode_escape'))
>
>         req.write("<p/>")
>         #
> # end: items
>
> req.write("SHOULD LOOK LIKE THIS: %s" % ( u'\u9577\u5ca1', ))
> %>
> </body>
> </html>
>
> ---- RESULTS:
>
> MYSQL CONNECTION CHARSET: utf8
>
> SYS.DEFAULTENCODING: utf-8
>
> DATA: 長岡, item: {'data': u'\xe9\x95\xb7\xe5\xb2\xa1',  
> 'id': 35L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1
>
> DATA: 長岡, item: {'data': u'\xe9\x95\xb7\xe5\xb2\xa1',  
> 'id': 36L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1
>
> SHOULD LOOK LIKE THIS: 長岡
>
>
> -------   Notice if I latin1 decode the -correct- unicode object, i  
> get the exact
> unicode object that is appearing inside of the PSP:
>
> >>> u'\u9577\u5ca1'.decode('latin1')
> u'\xe9\x95\xb7\xe5\xb2\xa1'
>
>
>
>
>


Mime
View raw message