uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}
Date Mon, 08 Jan 2018 18:31:07 GMT
On 08.01.2018, at 16:16, Marshall Schor <msa@schor.com> wrote:
> 
> After a lot of thought, here's a proposal, along the lines Richard suggests:
> 
> The basic idea is to have the JCas classes, if they exist for some type, augment
> that type with features defined only in the JCas class.
> 
> This augmentation would be done at type system commit time, and would really
> modify the type system being committed to have the extra features.  Because the
> type system would be modified to include these extra features, the Feature
> Structures made with these "augmented" types would be larger (because they would
> have slots for these features).  This insures that subtypes' features won't
> overlap / collide with the expanded features.
> 
> I'll work out the details, and see if I can make this change.

After some though, I believe the problem with the availability and ordering of
features can be sidestepped if we consider the JCas classes as a canonical source
for type system definitions.

JCas classes represent a pretty strong and rigid contract on the type system and
the can only be one set of the available through the classloader at any given time.
XML TSDs on the other hand are comparably flexible and a dime a dozend. Arbitrary
numbers of them can be merged and used to initialize a CAS.

So my suggestion would be: when using the JCas API, then JCas classes are treated
as the canonical source for the type system definition. They define which types
exist, which parent types they have, and what is the order of the features. If
a user provides additional TSDs when initializing a CAS, then these are merged
on top of the definitions sourced from the JCas classes. In this way, features
defined in JCas classes can never be missing and they always have a defined order,
irrespective of the presence of any other TSDs. If any addition features are
defined in TSDs, then they need to be access through the CAS API anyway. I believe
there would also be no issues with subtypes in this "JCas first" scenario.

This approach would also avoid that accessing features defined in JCas but not
defined in an XML TSD would trigger an error, since the features are defined
via their presence in the JCas class.

A potential downside is, that users who initialize CAS with a small XML TSD but
who have rich JCas classes on the classpath might end up with more memory usage
than they asked for - I assume that would rarely happen. This could be mitigated
by only initializing JCas classes if their types are actually defined in the
user-provided TSD at initialization time. Finally, users who really do not want
to have any JCas classes affect their CASes could maybe entirely disable JCas
for a given CAS instance - I thought years ago, I had seen an option somewhere
to do that, but I don't find it at the moment.

What do you think?

Cheers,

-- Richard
Mime
View raw message