uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Is UIMA-AS missing some encoding spec?
Date Sat, 09 Jun 2018 21:06:31 GMT
I think the communication between uima-as service and client ought to be:

1) an internal detail, not something "spec'd" which we have to adhere to

2) done in such a way as to always "work", regardless of what various "OS" 
defaults are doing.

Whatever gets implemented, shouldn't it work not just for getMetaData data, but 
also other kinds of data passed between client and server?

I'm thinking the problem might be one of underspecifying encoding/decoding 
configurations in the various communication subsystems and APIs we use 
internally, allowing them to pick up the OS "defaults".


On 6/7/2018 3:01 PM, Jaroslaw Cwiklik wrote:

> I've created JIRA for this: https://issues.apache.org/jira/browse/UIMA-5791
> Not yet sure how to fix this. Will take a look next week. If I understand
> the requirements right, the default encoding should be UTF-8 when
> deserializing service metadata..
> There should also be a way to override the default. Seems like we need a
> new cmdline arg (or property) for the client to override default encoding.
> Jerry
> On Thu, Jun 7, 2018 at 9:31 AM Marshall Schor <msa@schor.com> wrote:
>> Recently, we debugged an issue where a user had a UIMA-AS client running
>> on
>> Windows, connecting to a UIMA-AS service running on Linux in the cloud.
>> The linux box was set up with LANG etc set to UTF-8.  Windows did not have
>> any
>> special configuration.
>> After a successful service deployment on Linux, the Windows client sent a
>> get
>> meta, which received a "message string" from the transport, and tried to
>> parse
>> it with the xml parser, but that returned an error
>> org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
>> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>> at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:202)
>> Eventually the user worked around this launching the Windows client Java
>> with
>> the extra parameter
>>    -D"file.encoding-UTF-8"
>> which made this problem go away (but may introduce other issues).
>> Should UIMA-AS communication protocols specify UTF-8 explicitly, instead
>> of
>> defaulting to "platform defaults" which seem to cause issues if the
>> defaults
>> aren't compatible?
>> -Marshall

View raw message