lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Bois-Crettez <andre.b...@kelkoo.com>
Subject Re: Solr exception when parsing XML
Date Wed, 16 Jan 2013 13:26:49 GMT
Forgot the link : http://en.wikipedia.org/wiki/Valid_characters_in_XML

André

On 01/16/2013 02:24 PM, Andre Bois-Crettez wrote:
> Worth to note that some characters are completely forbidden in XML, such
> as "chr(0)".
> When dealing with external text input, some cleanup might be necessary
> to avoid breaking indexation.
> For example you could replace each forbidden XML character with " ".
>
> André
>
> On 01/15/2013 09:55 PM, Alexandre Rafalovitch wrote:
>> Interesting point. Looks like CDATA is more limiting than I thought:
>> http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
>> recommendation is to avoid CDATA and automatically encode characters such
>> as yours, as well as less/more and ampersand.
>>
>> Regards,
>>      Alex.
>>
>> --

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive
de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire
et d'en avertir l'expéditeur.

Mime
View raw message