uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors
Date Mon, 25 Aug 2014 22:54:53 GMT
Is the JSON serialization documented somewhere?

I saw that there appear to be quite a few alternative serializations. It
seems to include something like a typesystem definition, but only with a
list of feature names, not their types, if I understood the format
correctly (@featureRefs has a list of the features that are not of
primitive types, it seems).

It would be very useful if the serialization allowed one to easily pull out
a partial CAS with just a subset of the views (by only including some
subtrees of the JSON structure), and merge views into it. This might be
complicated, as I understand that the views define annotation indices, but
the same annotation can be indexed in several views, right?

-- Jens


On Fri, Aug 15, 2014 at 5:00 PM, Marshall Schor <msa@schor.com> wrote:

> In the current design, both UIMA arrays and lists are serialized using JSON
> arrays, if the feature value is marked as MultipleReferencesAllowed -
> false .
> So the "list" versus "array" nature in UIMA would be lost in the
> serialization,
> unless the type system information is available.
>
> I suspect that in most cases, this won't be important.  But if it is, it
> can be
> avoided by specifying MultipleReferencesAllowed - true in the UIMA type
> system
> description.
>
> -Marshall
>
> On 8/15/2014 8:55 AM, Marshall Schor wrote:
> > Hi,
> >
> > The trunk is beginning to have a mostly working version of this
> serialization.
> > I'm checking out the edge cases with test cases (and finding the usual
> bugs that
> > are being fixed).
> >
> > I'm not very familiar with BSON, so if someone knows how a conversion to
> that
> > format from JSON might inform the JSON design, please post :-)
> >
> > -Marshall
> >
> >
> > On 8/13/2014 4:46 PM, Jens Grivolla wrote:
> >> Hi, I am very interested in this.
> >>
> >> In particular, we have so far stored CASs as compressed XMI in MySQL but
> >> are now moving to MongoDB. Having a lossless generic JSON serialization
> >> (equivalent to XMI) would be a much better fit as MongoDB could then
> store
> >> it pretty much natively and it would even enable some simple queries on
> the
> >> annotations directly in MongoDB.
> >>
> >> I'm not sure if there are any special considerations to make the JSON
> >> serialization fully compatible with MongoDBs BSON format.
> >>
> >> -- Jens
> >>
> >>
> >> On Wed, Jul 30, 2014 at 10:44 PM, Marshall Schor (JIRA) <
> dev@uima.apache.org
> >>> wrote:
> >>> Marshall Schor created UIMA-3969:
> >>> ------------------------------------
> >>>
> >>>              Summary: Add JSON Serialization for CASs and UIMA
> Descriptors
> >>>                  Key: UIMA-3969
> >>>                  URL: https://issues.apache.org/jira/browse/UIMA-3969
> >>>              Project: UIMA
> >>>           Issue Type: New Feature
> >>>           Components: Core Java Framework
> >>>     Affects Versions: 2.6.0SDK
> >>>             Reporter: Marshall Schor
> >>>             Assignee: Marshall Schor
> >>>             Priority: Minor
> >>>              Fix For: 2.6.1SDK
> >>>
> >>>
> >>> Recent trends toward moving things into the cloud motivated me to
> consider
> >>> what a JSON serialization of the CAS and descriptor metadata (more
> >>> particularly, type systems) might look like.
> >>>
> >>> I've put up a Wiki page with some of the thoughts so far in this
> >>> exploration, here:
> >>>
> https://cwiki.apache.org/confluence/display/UIMA/JSON+serialization+for+UIMA
> >>>
> >>> I'm also fooling around with a proof-of-concept implementation, based
> on
> >>> our current XMI serialization for the CAS, as well as our
> >>> MetaDataObject_impl serialization for UIMA descriptors, in order to
> work
> >>> out the details.  There are additional nits (like how to configure
> things)
> >>> not yet worked out.
> >>>
> >>> Comments and discussion appreciated; I've put this up as a Jira to
> record
> >>> them together - but feel free to use email also for any comments you
> feel
> >>> might be better being more ephemeral.
> >>>
> >>>
> >>>
> >>> --
> >>> This message was sent by Atlassian JIRA
> >>> (v6.2#6252)
> >>>
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message