lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sascha Szott <sz...@zib.de>
Subject Re: VelocityResponseWriter/Solritas character encoding issue
Date Wed, 18 Nov 2009 16:15:02 GMT
Hi Erik,

Erik Hatcher wrote:
> Can you give me a test document that causes an issue?  (maybe send me a 
> Solr XML document in private e-mail).   I'll see what I can do once I 
> can see the issue first hand.
Thank you! Just try the utf8-example.xml file in the exampledoc 
directory. After having indexed the document, the output of the script 
test_utf8.sh suggests to me that everything works correctly:

  Solr server is up.
  HTTP GET is accepting UTF-8
  HTTP POST is accepting UTF-8
  HTTP POST does not default to UTF-8
  HTTP GET is accepting UTF-8 beyond the basic multilingual plane
  HTTP POST is accepting UTF-8 beyond the basic multilingual plane
  HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual

If I'm using the standard QueryResponseWriter and the query q=umlauts, 
the responding xml page contains properly printed non-ASCII characters. 
The same query against the VelocityResponseWriter returns a lot of 
Unicode replacement characters (u+FFFD) instead.

-Sascha

> 
> On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
> 
>> Hi,
>>
>> I've played around with Solr's VelocityResponseWriter (which is indeed 
>> a very useful feature for rapid prototyping). I've realized that 
>> Velocity uses ISO-8859-1 as default character encoding. I've changed 
>> this setting to UTF-8 in my velocity.properties file (inside the conf 
>> directory), i.e.,
>>
>>   input.encoding=UTF-8
>>   output.encoding=UTF-8
>>
>> and checked that the settings were successfully loaded.
>>
>> Within the main Velocity template, browse.vm, the character encoding 
>> is set to UTF-8 as well, i.e.,
>>
>>   <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
>>
>> After starting Solr (which is deployed in a Tomcat 6 server on a 
>> Ubuntu machine), I ran into some character encoding problems.
>>
>> Due to the change of input.encoding to UTF-8, no problems occur when 
>> non-ASCII characters are presend in the query string, e.g. german 
>> umlauts. But unfortunately, something is wrong with the encoding of 
>> characters in the html page that is generated by 
>> VelocityResponseWriter. The non-ASCII characters aren't displayed 
>> properly (for example, FF prints a black diamond with a white question 
>> mark). If I manually set the encoding to ISO-8859-1, the non-ASCII 
>> characters are displayed correctly. Does anybody have a clue?
>>
>> Thanks in advance,
>> Sascha
>>
>>
>>
>>
>>
>>
>>


Mime
View raw message