lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Whitman <brian.whit...@variogr.am>
Subject invalid XML character
Date Sat, 01 Mar 2008 21:22:35 GMT
Once in a while we get this

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470]
[14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was  
found in the element content of the document.
[14:32:21.877] 	at  
com 
.sun 
.org 
.apache 
.xerces 
.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:588)
[14:32:21.877] 	at  
org 
.apache 
.solr 
.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java: 
318)
[14:32:21.877] 	at  
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)
...

Our data comes from all sorts of places and although we've tried to be  
utf8 wherever we can, there are still cracks.

I would much rather a document get added with replacement character  
than to have this error prevent the addition of 8K documents (as has  
happened here, this one character was in a 8K <add><doc>..<doc... run,  
and only the ones before this character were added.)

Is there something I can do on the solr side to ignore/replace invalid  
characters?






Mime
View raw message