uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: Is UIMA-AS missing some encoding spec?
Date Fri, 08 Jun 2018 12:28:34 GMT
How about the service side on Windows, will that also need the same changes?
Should Windows and Linux be compatible by default? Am pretty sure they once
were.
Eddie

On Thu, Jun 7, 2018 at 3:01 PM, Jaroslaw Cwiklik <uimaee@gmail.com> wrote:

> I've created JIRA for this: https://issues.apache.org/
> jira/browse/UIMA-5791
> Not yet sure how to fix this. Will take a look next week. If I understand
> the requirements right, the default encoding should be UTF-8 when
> deserializing service metadata..
> There should also be a way to override the default. Seems like we need a
> new cmdline arg (or property) for the client to override default encoding.
> Jerry
>
> On Thu, Jun 7, 2018 at 9:31 AM Marshall Schor <msa@schor.com> wrote:
>
> > Recently, we debugged an issue where a user had a UIMA-AS client running
> > on
> > Windows, connecting to a UIMA-AS service running on Linux in the cloud.
> >
> > The linux box was set up with LANG etc set to UTF-8.  Windows did not
> have
> > any
> > special configuration.
> >
> > After a successful service deployment on Linux, the Windows client sent a
> > get
> > meta, which received a "message string" from the transport, and tried to
> > parse
> > it with the xml parser, but that returned an error
> >
> > org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
> > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> > at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
> Source)
> > at org.apache.uima.util.impl.XMLParser_impl.parse(
> XMLParser_impl.java:202)
> >
> > Eventually the user worked around this launching the Windows client Java
> > with
> > the extra parameter
> >
> >   -D"file.encoding-UTF-8"
> >
> > which made this problem go away (but may introduce other issues).
> >
> > Should UIMA-AS communication protocols specify UTF-8 explicitly, instead
> > of
> > defaulting to "platform defaults" which seem to cause issues if the
> > defaults
> > aren't compatible?
> >
> > -Marshall
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message