struts-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J.Patterson Waltz III <>
Subject Re: Character encoding problems after 1.1 to 1.2.4 upgrade
Date Thu, 06 Jan 2005 16:38:00 GMT

On 6 janv. 05, at 17:17, Guillaume Cottenceau wrote:

> J.Patterson Waltz III <lists 'at'> writes:
>> On 6 janv. 05, at 15:52, J.Patterson Waltz III wrote:
>>> Now, I guess I'll just have to try using the character encoding
>>> filter Guillaume recommended.
>> Ack! I'm about to pull my hair out over these encoding issues. I added
>> the SetCharacterEncodingFilter from the Tomcat 5 distribution to my
>> web application, with just enough mods to get some logging output from
>> it so I'd know it was doing its thing.
>> So now I have the following in place to ensure incoming and outgoing
>> UTF-8 encoding:
>> - A <%@ page pageEncoding="UTF-8"
>> contentType="text/html;charset=UTF-8" language="java" %> directive
>> - an acceptCharset="UTF-8" attribute on <html:form> tags
>> - an  enctype="application/x-www-form-urlencoded;charset=UTF-8"
>> attribute on <html:form> tags
>> - the SetCharacterEncodingFilter, configured to interpret UTF-8 no
>> matter what
>> and yet I'm *still* getting non-decoded UTF-8 displayed in my pages
>> (i.e. été is été).
>> Guillaume, did you actually get UTF-8 to work using the filter
>> solution? If so, can you (or anyone) think of anything else I might
>> have missed? Thanks in advance.
> Yes, it works.
> First, verify `tomcat->browser': please try to render your page
> with "wget -S" to see precisely the headers (Content-Type must
> specify UTF-8) and the contents (double-check the output is
> UTF-8) (to verify your browser is not bugged).

Here's the tomcat->browser headers of the page which contains the form:
HTTP/1.1 200 OK
Content-Type: text/html;charset=UTF-8
Content-Language: en
X-Transfer-Encoding: chunked
Date: Thu, 06 Jan 2005 16:30:08 GMT
Server: Apache-Coyote/1.1
Content-length: 16401

> Second, verify `browser->tomcat': use a proxy (or netcat in
> listen mode) to precisely see what headers your browser is
> sending (if you will use the filter to force UTF-8, that doesn't
> matter much) and the encoding of the data. Typically, browsers
> will encode in UTF-8 if the page containing the form was using
> UTF-8 itself, but accept-charset can do no harm, but as you
> noticed they don't set the charset in the Content-Type header
> they use (according to mozilla's bugzilla, it's because it breaks
> too many servers); but you have to double-check that (in my
> experience, mozilla and MSIE do work).

And here's the POST response from Firefox, including the form data  
(sensitive data manually replaced with x's):

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US;  
rv:1.7.5) Gecko/20041107 Firefox/1.0
Accept-Language: fr-fr,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: lang=en; JSESSIONID=035AC799FB05BE35FE9B9E96D0664930
Content-Type: application/x-www-form-urlencoded
Content-Length: 841


Notice in the third line of the form data:  
That's 'été' URLencoded as UTF-8.

So I'm still stumped. :-(

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message