uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: SaxParserException
Date Fri, 06 Jun 2008 04:41:18 GMT
Ahmed Abdeen(Home) wrote:
> Hello UIMA Developers,I am getting the following error when I run the UIMA
> Document Analyzer.
> However, If I use the interactive mode it works fine. I can't specify what
> is the source file of this issue. I would appreciate any help.
> Thanks,
> Ahmed
>   
Please see 
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues

It appears that some String data which is being serialized has invalid 
character codes in it (from an XML viewpoint) - namely a x'00'. 
There are several things you can do. 

1) don't serialize this in the cas consumer.  You may not realize this, 
but the Document Analyzer puts a cas consumer following your annotator, 
for the purpose of serializing out the processed data.  If you write 
your own UIMA application, and don't use the Document Analyzer tool, you 
can choose what to write out and whether or not to use the XML 
serialization.

2) If it is acceptable in your application, you can change the invalid 
data to some alternate valid data.  This change would of course depend 
on your application requirements.

Does this answer your question?

-Marshall


Mime
View raw message