uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: changing the semantics (very slightly) for JCas objects with respect to Views
Date Thu, 10 May 2007 22:29:21 GMT
On 5/10/07, Marshall Schor <msa@schor.com> wrote:
> While working on the class-loader switching code, we have revisited an
> issue with the way JCas objects work with respect to views.
>
> Currently, for each view, there is a separate set of xxx_Type objects, a
> separate set of "cached" cover objects (which are identical to other
> view's objects, except that their _Type ref points to the instance for
> this view).
>
> This is there only (as far as we can see) to support
> aJCasObject.addToIndexes() and removeFromIndexes() which uses this
> information to pick the right "view" to use (remember that indexes are
> held per view).
>
> Besides inefficiency (replication of objects per view), there is another
> side-effect.  JCas Objects can, themselves, be extended by users to hold
> additional information, other than what's in the CAS.  The current
> design would create new versions of these objects per view, so that
> iterators over different views would get different instances.  So
> information set into one JCas object in one view would not be "visible"
> to instances obtained by iterating using a different view's index.  This
> could be a documented "feature", or it could be a "bug".
>
> Because current users seem to often use the aJCasObject.addToIndexes()
> method, I want to retain that method, while getting the efficiencies and
> fixing the "bug" (if we consider it a bug) above.  To do this, we could
> make this work as before *for sofa-unaware annotators, only* as
> follows:  Change the impl of addToIndexes and removeFromIndexes to
> reference the "current-view".
>

It makes me very nervous to put in changes that intentionally break
compatibility with existing annotators.  One of UIMA's main goals is
to make it easier to integrate analytics into applications.  We don't
want application developers to be concerned that if they update their
UIMA version, suddently components that used to work will no longer
work.  Sometimes that means we're stuck with a suboptimal design for
something, but that's life.  If we *really* want to stop supporting
this method, deprecate it first and then wait a few years for the 3.0
release to come out, and think about doing it then. ;)

It seems the argument here is that there aren't very many multi-sofa
annotators, so breaking them (the ones that use JCas anyway) is not
that big a deal.  I'm just not sure how to judge that.

With that general comment out of the way, let me consider this
specific issue.  I think the code that would break is this:

JCas someView = baseJCas.getView(name);
MyAnnotation annot = new MyAnnotation(someView, begin, end);
annot.addToIndexes();

It would no longer add the annotation to someView.  Instead it would
try to add it to the current view (which I think is the "initial view"
in this case).

But wait.. for types derived from annotations, we already know what
view it should be indexed in.  Just follow the annotation's Sofa
reference and you will find the right view.  It's not valid to index
it in any other view, and in fact that results in an exception.

So instead of indexing in the "current view" (which might fail), for
annotations you could always index in the "correct view". :)  This
should not break any annotators.

That leaves non-annotation types (and only those with custom indexes
defined for them matter).  For those we could:

(a) Decide we don't care about breaking this.  At this point the
number of affected annotators might be zero, but we can't be sure.  I
still don't like it on general principles.

(b) Optimize what we can get away with.  Only create extra _Type
objects for non-annotation types for which custom indexes are defined.
 That has the downside of probably making the JCas code ugly and more
prone to bugs.

Next thought... what if instead of the separate _Type objects, you
maintained in the JCasImpl a map from JCas object to "home view".  As
above you only need to do this for non-annotation objects that have
indexes defined for them (AND if the home view is not the initial
view, since that could be the default).

That has different performance characteristics which may or may not
necessarily be better (it depends on how many instances get created I
think), but maybe it is a cleaner way to stay compatible.

Hope that helped,
  -Adam

Mime
View raw message