lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Whitman <>
Subject invalid XML character
Date Sat, 01 Mar 2008 21:22:35 GMT
Once in a while we get this ParseError at [row,col]:[4,790470]
[14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was  
found in the element content of the document.
[14:32:21.877] 	at  
[14:32:21.877] 	at  
[14:32:21.877] 	at  

Our data comes from all sorts of places and although we've tried to be  
utf8 wherever we can, there are still cracks.

I would much rather a document get added with replacement character  
than to have this error prevent the addition of 8K documents (as has  
happened here, this one character was in a 8K <add><doc>..<doc... run,  
and only the ones before this character were added.)

Is there something I can do on the solr side to ignore/replace invalid  

View raw message