lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Cline <pcl...@pobox.upenn.edu>
Subject Re: Illegal xml/html character; unicode problems near solr
Date Fri, 07 Mar 2008 20:25:00 GMT
Nicolas and Yonik,

Thank you both for your excellent responses--this fixed my problem.  Now 
it's time to go back and remove all the hacks I was using to pin this 
thing together without proper utf-8 support. 

Thanks again,
Peter

nicolas.dessaigne@arisem.com wrote:
> I think Tomcat defaults to the operating system default, e.g. cp1252 on a
> classic windows.
>
> You need to add an attribute URIEncoding="UTF-8" to the Connector you use in
> the server.xml conf.
>
> Nicolas
>
> -----Message d'origine-----
> De : yseeley@gmail.com [mailto:yseeley@gmail.com] De la part de Yonik Seeley
> Envoyé : vendredi 7 mars 2008 18:53
> À : solr-user@lucene.apache.org
> Objet : Re: Illegal xml/html character; unicode problems near solr
>
> On Fri, Mar 7, 2008 at 12:30 PM, Peter Cline <pcline@pobox.upenn.edu> wrote:
>   
>>  The following is a snippet of a link to use a facet:
>>  search-faceted.html?q=[* TO
>>  *]&amp;facet=true&amp;rows=25&amp;fq=name_facet:&#34;Brasseur de
>>  Bourbourg, abb%C3%A9, 1814-1874, former owner&#34;"
>>
>>  These characters are correctly specified. When it returns, I get an
>>  illegal character error. Examining the XML, I get an fq value of:
>>  name_facet:"Brasseur de Bourbourg, abbÃÂ(c), 1814-1874, former owner"
>>     
>
> Is this bad XML part of the responseHeader (parameters that are simply
> being echoed back)?
> If so, it's most likely the config on whatever servlet container you
> are using... you need to configure it to accept UTF-8 URLs rather than
> latin-1 (Tomcat defaults to the old-style latin-1 AFAIK)
>
> -Yonik
>
>   

Mime
View raw message