lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <ysee...@gmail.com>
Subject Re: Cyrillic characters
Date Wed, 19 Jul 2006 19:12:50 GMT
On 7/19/06, WHIRLYCOTT <phil@whirlycott.com> wrote:
> Solr-trunk currently uses ISO-8859-1 as the character encoding for
> the admin pages.  One of the patches I submitted changes the admin
> pages to use UTF-8 and that fixes the problem.

OK, we are closer to working correctly.  It appears that the web
browsers are trying to be smart when submitting form data and using
the encoding of the received page to submit the HTTP-GET (non-standard
behaviour as I read it, but it may be to support legacy stuff).

So changing the admin pages to use UTF-8, and clearing the browser
caches, does indeed make both Firefox and IE send percent-encoded
UTF-8 (h%C3%A9llo).

Now the problem: Tomcat 5.5.17 isn't decoding percent-encoded UTF-8,
but instead treating %C3%A9 as two separate characters.  Soooo, I
think Bertrand is right about there being some web.xml setting....
time to hit the tomcat docs, and if that fails, grab Yoav's attention
:-)

I would be interested to know what some of the built-in http client
libs out there do:
  - HTTPClient, python, ruby, rhino, etc
Hopefully most do the right thing w.r.t. UTF-8, but if not, one can
always post queries with a content-type of UTF-8.


-Yonik

Mime
View raw message