uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors
Date Fri, 15 Aug 2014 16:00:53 GMT
In the current design, both UIMA arrays and lists are serialized using JSON
arrays, if the feature value is marked as MultipleReferencesAllowed - false . 
So the "list" versus "array" nature in UIMA would be lost in the serialization,
unless the type system information is available.

I suspect that in most cases, this won't be important.  But if it is, it can be
avoided by specifying MultipleReferencesAllowed - true in the UIMA type system


On 8/15/2014 8:55 AM, Marshall Schor wrote:
> Hi,
> The trunk is beginning to have a mostly working version of this serialization. 
> I'm checking out the edge cases with test cases (and finding the usual bugs that
> are being fixed).
> I'm not very familiar with BSON, so if someone knows how a conversion to that
> format from JSON might inform the JSON design, please post :-)
> -Marshall
> On 8/13/2014 4:46 PM, Jens Grivolla wrote:
>> Hi, I am very interested in this.
>> In particular, we have so far stored CASs as compressed XMI in MySQL but
>> are now moving to MongoDB. Having a lossless generic JSON serialization
>> (equivalent to XMI) would be a much better fit as MongoDB could then store
>> it pretty much natively and it would even enable some simple queries on the
>> annotations directly in MongoDB.
>> I'm not sure if there are any special considerations to make the JSON
>> serialization fully compatible with MongoDBs BSON format.
>> -- Jens
>> On Wed, Jul 30, 2014 at 10:44 PM, Marshall Schor (JIRA) <dev@uima.apache.org
>>> wrote:
>>> Marshall Schor created UIMA-3969:
>>> ------------------------------------
>>>              Summary: Add JSON Serialization for CASs and UIMA Descriptors
>>>                  Key: UIMA-3969
>>>                  URL: https://issues.apache.org/jira/browse/UIMA-3969
>>>              Project: UIMA
>>>           Issue Type: New Feature
>>>           Components: Core Java Framework
>>>     Affects Versions: 2.6.0SDK
>>>             Reporter: Marshall Schor
>>>             Assignee: Marshall Schor
>>>             Priority: Minor
>>>              Fix For: 2.6.1SDK
>>> Recent trends toward moving things into the cloud motivated me to consider
>>> what a JSON serialization of the CAS and descriptor metadata (more
>>> particularly, type systems) might look like.
>>> I've put up a Wiki page with some of the thoughts so far in this
>>> exploration, here:
>>> https://cwiki.apache.org/confluence/display/UIMA/JSON+serialization+for+UIMA
>>> I'm also fooling around with a proof-of-concept implementation, based on
>>> our current XMI serialization for the CAS, as well as our
>>> MetaDataObject_impl serialization for UIMA descriptors, in order to work
>>> out the details.  There are additional nits (like how to configure things)
>>> not yet worked out.
>>> Comments and discussion appreciated; I've put this up as a Jira to record
>>> them together - but feel free to use email also for any comments you feel
>>> might be better being more ephemeral.
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.2#6252)

View raw message