lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: unindexible Chars?
Date Wed, 06 Apr 2011 20:50:29 GMT

> Once and awhile, my post.jar seems to fail on commit. Durring the commit
> process, I have gotten a few errors. One is that EOF character found, and
> another is that semicolon expected after &the. I also have come across a >
> was expected.
> 
> So my question is what characters do I need to strip out of the source text
> to ensure all posts are sucessful?

The usual, it _must_ be valid XML.

> 
> One side note. I have placed the text fields within <![CDATA[ ]] before
> adding the document.

That's not a bad idea, then at least nothing bad can happen with the data 
embedded in the element. Usually these errors indicate invalid XML.

Try xmllint with some XML body giving errors.


> 
> Thanks,
> Charlie

Mime
View raw message