lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <>
Subject Re: unindexible Chars?
Date Wed, 06 Apr 2011 20:50:29 GMT

> Once and awhile, my post.jar seems to fail on commit. Durring the commit
> process, I have gotten a few errors. One is that EOF character found, and
> another is that semicolon expected after &the. I also have come across a >
> was expected.
> So my question is what characters do I need to strip out of the source text
> to ensure all posts are sucessful?

The usual, it _must_ be valid XML.

> One side note. I have placed the text fields within <![CDATA[ ]] before
> adding the document.

That's not a bad idea, then at least nothing bad can happen with the data 
embedded in the element. Usually these errors indicate invalid XML.

Try xmllint with some XML body giving errors.

> Thanks,
> Charlie

View raw message