uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Some questions about UIMA arrays of FeatureStructures with a specified component type
Date Thu, 21 Apr 2016 13:50:51 GMT
additional detail: There is a serializer for type systems,
TypeSystem2Xml.typeSystem2Xml(typeSystem, outputStream).

The implementation excludes from the serialization any types it thinks are
"built-in", and this includes all array types.

So no serialization is done of types like Annotation[].

-Marshall

On 4/20/2016 3:38 PM, Marshall Schor wrote:
> Apologies for the long email.  Short version - it appears that arrays of
> specific Feature Structure types (e.g. myFoo[]) have some holes in the support;
> some possible ways forward.
>
> -----------------
>
> UIMA has some support for arrays and lists of FeatureStructures (FSs) with the
> elements restricted to a particular FS type. This is supported in the type
> system descriptors, where you can specify in the "featureDescription" an
> "elementType".
>
> One use could be to use these types with indexing; you can get an index over all
> instances of arrays of some specific type.
>
> In the implementation, I see further support.  It is possible to create a type
> which is a FS array with a component type, using the TypeSystemManager API:
> getArrayType(component_type).  This creates (or just retrieves, if already
> created) a type whose name is the name of the component_type, suffixed with
> "[]".  Example:  "uima.tcas.Annotation[]".
>
> You can also specify these types in the XML type descriptor, but not directly;
> you can only specify them in the "feature" description for another type, where
> that feature is referencing it.
>
> To actually create instances of these types seems not quite implemented.  To
> create an array, the API needs to include the array length.  Looking at the
> non-JCas APIs, we have in the CAS Interface methods for creating arrays:
>
> createBooleanArray(length)
> createStringArray(length)
>   etc.
> createArrayFS(length)
>
> but there's no
>
> createArray(type, length)
>
> The LowLevelCAS interface has this though:
>
> ll_createArray(type, length)
>
> I couldn't find any tests that actually create one of these objects, using this API.
>
> Modifying a test case to create one of these, and then attempting to serialize
> it with both XMI and XCAS serialization produced invalid XML if the array was in
> fact serialized as a separate object.  This is the case in XCAS and in XMI when
> the array is referenced from a feature description, and that feature description
> is marked as "multipleReferencesAllowed". 
>
> In these cases, the convention to serialize a FeatureStructure is to serialize
> it using the name of the type as the XML element name.  For example, the type
> "Foo" gets serialized as <Foo ... />.  But the name of these types ends in "[]",
> e.g. Annotation[].  And the characters "[]" are not legal as part of an XML
> element name.
>
> There is some code that in some (but not all) cases serializes this using the
> element name "FSArray" instead.  But the deserialization code produces for this
> FSArray instances instead of the more specific type instances. When the
> deserialized object is referenced from another type via a feature having an
> "elementType" specification (in the receiving type system), that information
> could be used to fix-up the deserialized array instance type, to the that spec's
> component type.
>
> It also appears that the casCopier doesn't support creating these kinds of objects.
>
> I've probably missed some things in my analysis of this.  I'm thinking we ought
> to fix the CasCopier and XMI and XCAS serialization to work when serializing
> these objects (by serializing them as FSArray, although that loses the component
> type info).  When deserializing XMI and XCAS, these FSArray objects could be
> updated to include the element-type information when and if that was available,
> for instance, if there was a reference from some typed feature having an element
> type).
>
> This isn't perfect; to be 100% accurate, we would need to be able to record the
> element type in the serialized stream for these instances.
>
> I haven't (yet) thought much about JCas for this issue, or support for fslists.
>
> Other thoughts?
>
> -Marshall
>


Mime
View raw message