uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: UIMA AS Binary vs XMI serialization
Date Fri, 04 Jan 2013 16:15:49 GMT
Hi Pei,

The binary serialization option requires that client and service have
identical CAS TypeSystem definitions. With Xmi it is only required
that the service's definition is a proper subset of the client. Note
that the client's TypeSystem will automatically integrate that of all
delegate services, so the potential problem here is for the service to
have a type with a feature that is incompatible with the definition
for that type on the client. An example would be for feature named
"foo" to be a float on the client and an integer on the server.

The best approach to be safe for binary serialization would be for all
analytic components to import their TypeSystem definitions from a
common place.

Eddie

On Thu, Jan 3, 2013 at 4:25 PM, Chen, Pei
<Pei.Chen@childrens.harvard.edu> wrote:
> Hi,
> I was just curious on others' experience with the binary serialization.
>
> My original issue was documents which contained invalid XML chars, so I decided to try
the binary serialization option within AS instead of replacing/modifing the special chars
in the original docs.  As a side effect, I noticed that it's magnitudes of order faster;
> Just curious if there were any reasons why not make this  the recommended/default when
sending CAS's around within AS.  Are there any downsides to be aware of (assuming that UIMA
will have wrappers to abstract this from users for all of their implementations.)
>
> Caused by: org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0 character:
, 0x0
>         at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
>
> --Pei

Mime
View raw message