uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}
Date Wed, 10 Jan 2018 15:52:01 GMT
Another "failure" use case:

- Load type T, with feature f1. 
- Load JCas for type T with feature f2. 
--  (merged type T has f1, f2, used to assign offsets)

Next, load type T with features f1, and f3.
- At commit, this would be "merged" with the JCas, to give f1, f3, and f2.
- But f2 already has an offset assigned, which would break the existing assign
algorithm (which assigns sequentially in the order of the feature structures).  

To attempt to overcome this in some cases, an algorithm would be needed which
attempted to assign offsets, constrained by any existing offsets present in any
/all of the JCas Classes for this type and its supertypes.

-Marshall

On 1/10/2018 10:42 AM, Marshall Schor wrote:
> The initial implementation requires features in the type system have an ordering
> that is consistent with what got assigned when the JCas was loaded.
>
> Some use cases with comments:
>
> 1) Type T loaded with features f1, f2, f3,  JCas loaded with f1, f2, f3
> Followed by: Type T loaded with features f1, f3.
>
> This causes at the 2nd Type T commit time, the augmentation of type T with
> feature f2.
> But, the (current) impl just does an "addFeature" API call.  The result is that
> without extra work, the features in the type system will be ordered as f1, f3,
> f2.  And the assigned offsets could be different. 
>
> To fix this, the algorithm which assigns offsets will need to see if the
> corresponding JCas class (if any) has offsets already assigned, and try to use
> those.
>
> 2) Type T having supertype TS; Type T has 1 feature, f1, JCas for Type T has 1
> feature f1.  TS has no features, no JCas for TS or JCas for TS has no features. 
> Followed by: Type TS is loaded, having one feature (not in the JCas if there is
> one for TS).
>
> This causes the features for type T (which includes all the features of its
> supertype), to have offsets shifted down.
> For example if T has feature f1 with offset "3",  it would now have offset "4"
> (accounting for the space taken by the TS feature).
>
> ========================================
>
> Because of these issues, I'm wondering if it's really worth the time and
> complexity to implement this "partial" solution, given that there are "complete"
> solutions of the following form:
>
> 1) Require users doing this kind of operation to first load a "merged" type
> system, creating a maximal-featured version (at least for all types / supertypes
> having user-defined JCas classes) over all type systems that will be processed,
> and use that to load (for the first time) the JCas classes.  When subset type
> systems are loaded subsequently by the application, they might cause failures
> (see supertype example use-case above).  To get around that, the application
> would need to change to always use the maximal type system for all loaded CASs. 
> Some deserializations allow deserializing a CAS with a subset-type-system into a
> CAS with a maximal type system.
>
> 2) Require users who want to have different type systems to load them using
> different class loaders (for the JCas classes).   This should work for all cases.
> ======================================
>
> 2 questions for the user community:
>
> A) Does the user community think this enhancement is of sufficient value, with
> all of its limitations, to be worth doing?  I could go either way on this ,
> personally.
>
> B) Is the extra work to figure out a mapping for case 1 at the top (arranging
> the ordering of features to attempt to preserve the fixed values for the loaded
> JCas offsets) worth doing?  (If not done, it would still be "checked", and users
> would know a situation arose needed them to fix).
> My feeling is that this is not worth the effort for the few cases it might enable.
>
> -Marshall
>
> On 1/9/2018 4:53 PM, Marshall Schor wrote:
>> I did an initial implementation, ignoring Pear files.
>>
>> I think the "feature expansion" when loading PEAR-classpath specified JCas
>> classes can't reasonably be done (because by the time you lazily get around to
>> loading these, the type system is committed).
>>
>> So, I plan to have the pear loading path operate like before, with no feature
>> expansion.
>>
>> I kind of doubt this will be a real issue in actual practice (he said hopefully
>> :-) ).
>>
>> Still need to fix up some test cases, but it's looking promising...
>>
>> -Marshall
>>
>>
>> On 1/8/2018 2:47 PM, Marshall Schor wrote:
>>> In working out the details, the following difficulty emerges:
>>>
>>> In the general case, a pipeline is associated with a class loader (used to load
>>> JCas classes).
>>> When the pipeline contains "PEARs", each pear can specify it's own class loader,
>>> and therefore, it's own set of JCas classes.
>>>
>>> So, at type system commit time, with this proposal, it would be necessary to
>>> find all of the class loaders that Pears might be using.  This unfortunately
is
>>> not possible in general, because the Pears are associated with a particular
>>> pipeline, and you can load a type system and create a CAS without referring to
a
>>> particular pipeline. 
>>>
>>> In the current implementation, the presence of a Pear in the pipeline is
>>> discovered (if and) when the pear is entered for the first time, and at that
>>> time (lazily) the loading of that Pear's JCas classes happens.
>>>
>>> Various limitations are possible, I suppose (e.g., not allowing a Pear version
>>> of JCas class to have new features, for example).
>>>
>>> Still thinking about this...
>>>
>>> -Marshall
>>>
>>>
>>> On 1/8/2018 10:16 AM, Marshall Schor wrote:
>>>> After a lot of thought, here's a proposal, along the lines Richard suggests:
>>>>
>>>> The basic idea is to have the JCas classes, if they exist for some type,
augment
>>>> that type with features defined only in the JCas class.
>>>>
>>>> This augmentation would be done at type system commit time, and would really
>>>> modify the type system being committed to have the extra features.  Because
the
>>>> type system would be modified to include these extra features, the Feature
>>>> Structures made with these "augmented" types would be larger (because they
would
>>>> have slots for these features).  This insures that subtypes' features won't
>>>> overlap / collide with the expanded features.
>>>>
>>>> I'll work out the details, and see if I can make this change.
>>>>
>>>> -Marshall
>>>>
>>>>
>>>> On 1/5/2018 2:05 PM, Richard Eckart de Castilho wrote:
>>>>> On 05.01.2018, at 17:16, Marshall Schor <msa@schor.com> wrote:
>>>>>> Based on Web Annot's use case, I'm thinking thorough alternatives.
>>>>> "WebAnno" ;)
>>>>>
>>>>>> One way to support this would be to have the user code tell the UIMA
framework
>>>>>> that no reachable instances of JCas classes exist; the user would
be responsible
>>>>>> for guaranteeing this.
>>>>> There may be no way for the user code to know if this is the case or
not or to 
>>>>> enforce this to be the case. 
>>>>>
>>>>>> The other choice would be to not support this (because of the inherent
dangers)
>>>>>> and instead require users having multiple type systems with JCas
classes
>>>>>> specifying features only in some versions of those type systems,
first load the
>>>>>> JCas classes with the feature-maximal versions of the types.
>>>>>>
>>>>>> I think I favor the 2nd approach, as it is much safer. 
>>>>>>
>>>>>> What do others think we should do?
>>>>> The current line of thinking seems to assume that:
>>>>>
>>>>> 1) a type system definition is loaded (maybe from an XML file)
>>>>> 2) a CAS is created using the TSD
>>>>> 3) the JCas classes are loaded and are initialized according to the TSD
>>>>>
>>>>> The suggestion to "first load a feature-maximal version of the types"
seems
>>>>> to be following that line. I.e. the TSD loaded in 1) should cover all
>>>>> the features also covered by the JCas classes.
>>>>>
>>>>> How about a slightly different approach:
>>>>>
>>>>> 1) a type system definition is loaded (maybe from an XML file)
>>>>> 1a) the JCas classes are loaded and their definitions are merged with
the
>>>>>     TSD
>>>>> 2) a CAS is created using the merged TSD
>>>>> 3) the JCas classes are initialized with the now feature-maximal type
system
>>>>>
>>>>> An error would/should be thrown if in step 1a the JCas classes
>>>>> and the TSD are inherently incompatible. 
>>>>>
>>>>> In this case, the JCas classes would be an additional source of type
system
>>>>> information. Thinking this further, one could even initialize a CAS without
>>>>> providing any TSD, simply by having UIMA inspect the available JCas classes
>>>>> (e.g. through classpath scanning or by providing the framework with a
list
>>>>> of classes). To complete this, the JCas classes could be enhanced with
>>>>> Java annotations to carry any information included in TSDs which is currently
>>>>> not included in a machine-readable way in the JCas classes, e.g. type
and
>>>>> feature description text. As such, a set of suitably annotated JCas classes
>>>>> could be converted to a TSD XML and vice versa.
>>>>>
>>>>> The above assumes that JCas classes are loaded and initialized eagerly,
but 
>>>>> probably it could be adapted to a situation where the classes are loaded
lazily.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> -- Richard
>>>>>
>>>>>
>


Mime
View raw message