uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2849) XMLSerializer is not robust to ascii control characters
Date Thu, 13 Jun 2013 20:03:20 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682639#comment-13682639
] 

Marshall Schor commented on UIMA-2849:
--------------------------------------

a partial fix - improved the error message when this failure happens to identify the position
of the bad character.
                
> XMLSerializer is not robust to ascii control characters 
> --------------------------------------------------------
>
>                 Key: UIMA-2849
>                 URL: https://issues.apache.org/jira/browse/UIMA-2849
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>    Affects Versions: 2.4.0SDK
>            Reporter: Matthew Hatem
>
> If any strings in the CAS contain an ascii control character the XMLSerializer fails
with exception below.  XMLSerializer appears to be escaping other invalid XML characters like
'&' and '<'.  Perhaps it would be appropriate to remove control characters (or escape
these characters as well in the case of XML 1.1).
> Workaround is to ensure all strings stored in the CAS do not contain ascii control characters.
 
> org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: , 0x1c
> 	at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
> 	at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
> 	at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
> 	at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
> 	at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
> 	at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
> 	at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
> 	at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1516)
> 	at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1496)
> 	at bugs.UimaXMIBug.writeXmi(UimaXMIBug.java:68)
> 	at bugs.UimaXMIBug.main(UimaXMIBug.java:38)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message