lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog" <goks...@gmail.com>
Subject RE: solr.py problems with german "Umlaute"
Date Thu, 06 Sep 2007 18:35:26 GMT
I researched this problem before. The problem I found is that Python strings
are not Unicode by default. You have to do something to make them Unicode.
Here are the links I found:

http://www.reportlab.com/i18n/python_unicode_tutorial.html
 
http://evanjones.ca/python-utf8.html
 
http://jjinux.blogspot.com/2006/04/python-protecting-utf-8-strings-from.html


We do the utf-8 encode&submit and so our strings are badly encoded and
stored. We are seeing the problem shown in "Marc-Andre Lemburg" in the
reportlab.com link: an e-forward-accent becomes some Japanese character.

-----Original Message-----
From: news [mailto:news@sea.gmane.org] On Behalf Of Christian Klinger
Sent: Thursday, September 06, 2007 2:55 AM
To: solr-user@lucene.apache.org
Subject: solr.py problems with german "Umlaute"

Hi all,

i try to add/update documents with
the python solr.py api.

Everything works fine so far
but if i try to add a documents which contain German Umlaute (ö,ä,ü, ...) i
got errors.

Maybe someone has an idea how i could convert my data?
Should i post this to JIRA?

Thanks for help.

Btw: I have no sitecustomize.py .

This is my script:
------------------------------------------------------
from solr import *
title="Übersicht"
kw = {'id':'12','title':title,'system':'plone','url':'http://www.google.de'}
c = SolrConnection('http://192.168.2.13:8080/solr')
c.add_many([kw,])
c.commit()
------------------------------------------------------

This is the error:

   File "t.py", line 5, in ?
     c.add_many([kw,])
   File "/usr/local/lib/python2.4/site-packages/solr.py", line 596, in
add_many
     self.__add(lst, doc)
   File "/usr/local/lib/python2.4/site-packages/solr.py", line 710, in __add
     lst.append('<field name=%s>%s</field>' % (
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)


Mime
View raw message