lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Cyrillic characters
Date Tue, 18 Jul 2006 22:09:00 GMT
Crap, you're right.  I have a well-tested application that's using  
UTF-8 everywhere possible and I just tested with some Russian text.   
Solr's coughing up this as an exception:

Jul 18, 2006 6:00:05 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1
         at org.apache.solr.core.SolrCore.execute(
         at org.apache.solr.servlet.SolrServlet.doGet 
         at javax.servlet.http.HttpServlet.service(
         at javax.servlet.http.HttpServlet.service(
         at org.mortbay.jetty.servlet.ServletHolder.handle 
         at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch 
         at org.mortbay.jetty.servlet.ServletHandler.handle 
         at org.mortbay.http.HttpContext.handle(
         at org.mortbay.jetty.servlet.WebApplicationContext.handle 
         at org.mortbay.http.HttpContext.handle(
         at org.mortbay.http.HttpServer.service(
         at org.mortbay.http.HttpConnection.service 
         at org.mortbay.http.HttpConnection.handleNext 
         at org.mortbay.http.HttpConnection.handle 
         at org.mortbay.http.SocketListener.handleConnection 
         at org.mortbay.util.ThreadedServer.handle 
         at org.mortbay.util.ThreadPool$ 

You're going directly against Solr/Jetty, right?  Not proxied or  
mod_rewrite'd through to Apache?

Solr isn't properly encoding the data being received by the servlet.   
I think that I can fix this using some of the tricks that I've  
learned in building my site.  More later.

How much testing have people done using UTF-8 data on Solr?


On Jul 18, 2006, at 5:53 PM, Tricia Williams wrote:

> Hi all,
>    I'm trying to adapt our old cocoon/lucene based web search  
> application to one that is more solrish.  Our old web app was  
> capable of searching for queries with cyrillic characters in them.   
> I'm finding that using the packaged example admin interface  
> entering a query with a string of cyrillic characters causes a  
> java.lang.ArrayIndexOutOfBoundsException. I've also noted that the  
> url built from the search form is not utf-8 encoded.  So obviously  
> if I try to manipulate the query string by inserting a utf-8  
> encoded string in the q= parameter the values are interpreted  
> incorrectly and as such I cannot use this approach as a work- 
> around.  My sample query is: ...... (the english word _canada_  
> translated into russian) or %D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0  
> (utf-8) or %26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26% 
> 231076%3B%26%231072%3B (solr url encoding)
>    I would appreciate any advice or suggestions that would allow me  
> to search for cyrillics in solr.  If anyone knows why solr is  
> behaving as it does with the strange encoding, a brief explanation  
> of what causes this behaviour could be helpful and what the  
> encoding is (unicode?).  If anyone else has force solr to accept  
> utf-8 encoded q= parameters with success I would love to know how  
> you did it.
> Thanks in advance!
> Tricia
> ps.  I am using mozilla firefox as my main browser which leads to  
> the behaviour I reported above.  IE 6.0 works fine for cyrillics  
> although there is still a strange but different encoding (%CA%E0%ED% 
> E0%E4%E0 for the same query as before).

                                    Philip Jacob

View raw message