uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors
Date Tue, 26 Aug 2014 03:33:54 GMT

On 8/25/2014 6:54 PM, Jens Grivolla wrote:
> Is the JSON serialization documented somewhere?
Yes, there's a chapter in the reference book.  You can build that
(uima-docbook-references), until it's released.

There are also lots of Javadocs in the main implementing class:
XmiCasSerializer.  (It's in this class because it shares a lot of the machinery
with Xmi serialization).

>
> I saw that there appear to be quite a few alternative serializations. It
> seems to include something like a typesystem definition, but only with a
> list of feature names, not their types, if I understood the format
> correctly (@featureRefs has a list of the features that are not of
> primitive types, it seems).
The @featureRefs is only those features which are "references" to other feature
structures.

You're correct, in noticing that the feature "range" types are not present. 
This is because the serialization is to JSON, which supports a native
representation of things that are collections (JSON arrays) which could be uima
Arrays or Lists, and ranges that are boolean are representable by JSON true and
false values.  There is no distinction that a number is a byte/short/int/long,
because those are all represented as a JSON "number".  And so forth...

The Json serialization for a CAS can optionally include parts of the type
system: It can include what the supertypes are for serialized types (to enable
iterating over a type and all of its subtypes, like Cas iterators normally do); 
it can also identify which slots which appear to have number values are actually
to be interpreted as references to other feature structures.  Otherwise, the
serialized form might have a slot "foo" : 111  which is a number value, and a
slot "bar" : 112 which is a reference to another feature structure whose ID is
112.  This extra information (in @featureRefs) permits the user of the JSON
serialized form a way to distinguish these two case.

>
> It would be very useful if the serialization allowed one to easily pull out
> a partial CAS with just a subset of the views (by only including some
> subtrees of the JSON structure), and merge views into it.
Another optional part of the serialization is a list of views, together with an
array of numbers each one of which represents a serialized Feature Structure
that is indexed in that view.
>  This might be
> complicated, as I understand that the views define annotation indices, but
> the same annotation can be indexed in several views, right?

Feature Structures can be classified into "Annotations" and other types (not a
subtype of Annotation).

Annotations are special - they have an implied reference to a particular subject
of analysis.  So they are restricted to being indexed in the view that is
associated with that subject-of-analysis.

Other types (not subtypes of Annotation (or more precisely, AnnotationBase)) do
not have this restriction, and can be indexed in multiple views.

See
http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aas.annotations_associated_sofa.

Let me know where the documentation might be improved :-)

-Marshall
>
> -- Jens
>
>
>


Mime
View raw message