lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "av_work@yahoo.com" <av_w...@yahoo.com>
Subject Re: Solr Update Handler Failes with Some Doc Characters
Date Wed, 09 May 2007 16:06:13 GMT
Hi,

I specify the following encoding when POSTING the data to Solr:

text/xml; charset=utf-8

The encoding of the actual XML is also UTF-8.

I see that the update handler fails even if the character is NOT right next to XML closing
tag. If the character is anywhere in any of the XML tags, the update handler fails to parse
the XML.

Thanks,
Av

----- Original Message ----
From: Yonik Seeley <yonik@apache.org>
To: solr-user@lucene.apache.org
Sent: Wednesday, May 9, 2007 10:45:43 AM
Subject: Re: Solr Update Handler Failes with Some Doc Characters


On 5/9/07, av_work@yahoo.com <av_work@yahoo.com> wrote:
> I run the example using Jetty on Windows 2003 machine. When I submit some documents containing
upper ASCII characters, Solr update handler fails with an XML parsing error saying that it
encountered an EOF before the closing tags.

Normally if there is a charset mixup, you will just get weird looking results.
I suppose that if a char that is greater than 128 is used, and Solr is
treating as UTF-8, then the following char would be treated as part of
a single multibyte character.  Hence if the char is directly followed
by XML markup, part of that XML markup will be lost (hence the parse
exception).

In short, this is probably a char encoding issue.  What character
encoding are you using when posting to Solr, and is it declared in the
HTTP header?

-Yonik


 
____________________________________________________________________________________
Bored stiff? Loosen up... 
Download and play hundreds of games for free on Yahoo! Games.
http://games.yahoo.com/games/front
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message