uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2101) CasToInlineXml adds whitespace
Date Tue, 15 Apr 2014 08:01:37 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969315#comment-13969315
] 

Richard Eckart de Castilho commented on UIMA-2101:
--------------------------------------------------

Are you ramping up towards some new release (so would there be some kind of time frame within
which I should look into this) or are you just doing spring cleaning?

> CasToInlineXml adds whitespace
> ------------------------------
>
>                 Key: UIMA-2101
>                 URL: https://issues.apache.org/jira/browse/UIMA-2101
>             Project: UIMA
>          Issue Type: Bug
>    Affects Versions: 2.3.1SDK
>            Reporter: Steven Bethard
>         Attachments: UIMA-2101-eckart-20110329.patch
>
>
> CasToInlineXml adds indentation between adjacent XML elements. E.g. for a single character
document with a single annotation covering that one character, it will write:
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <Document>
>     <uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified">
>         <uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation>
>     </uima.tcas.DocumentAnnotation>
> </Document>
> {noformat}
> I think it should instead write everything in a single line, that is:
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"><uima.tcas.Annotation
sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
> {noformat}
> I believe this could be fixed by replacing the line:
> {noformat}
> XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
> {noformat}
> with the line:
> {noformat}
> XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
> {noformat}
> I think it's a bug that CasToInlineXml is changing the character offsets, but I would
also be happy if there was an alternate constructor or a method on CasToInlineXml that allowed
disabling the formatting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message