uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: [SUMMARY] CAS and CasView redesign
Date Wed, 20 Dec 2006 14:51:52 GMT
On 12/19/06, Marshall Schor <msa@schor.com> wrote:
> > Basic Ideas:
> > *  The CAS is the container for all of the analysis data (as per the
> > UIMA spec).  It should be possible to create FS directly on the CAS
> > and there should be some reasonable way to access all FS in the CAS.
> This seems reasonable - especially for "simpler" CAS Processors.

Then you agree that there should be a reasonable way to access all FS
in the CAS... we will have to sort out the definition of "reasonable"
-- see below.  Also in case it was not clear, when I wrote this I
meant "there should be some reasonable way to acess all FS in the CAS,
without having to be concerned with views" - I probably should have
said that explicitly.

> >   * Each defined index will have one instance in the CAS as well as an
> >   instance for each view (or sofa?  right now sofas and views are 1-1 so
> >   it doesn't matter but I wonder what the right terminology is)
> >   * You can add FS to the indexes in a view (or multiple views).  You
> >   can also add FS to the indexes on the CAS, which is a place to store
> >   indexed FS that don't belong to any view.
> The only use case for this that I remember was "globally" indexed
> data, meaning that data is shared among a set of annotators.  But
> there's a problem with this - once you put that set of annotators together
> with another set, you run the risk of collisions among "shared" items.
> Sharing among a set is better served by having a specially named
> view.  Using a global one may expose one to future problems when combining
> independently written parts.
> >
> >   * If you get an iterator over an index from the CAS, this iterator
> >   will return you FS that were indexed in the CAS well as FS that were
> >   indexed in any view.
> An interesting design choice.  What are the use-cases for making it work
> this way?  What do
> we give up when we choose this way, versus just returning those FSs that
> were specifically indexed in the base view?

The idea is to think of this design choice as supporting the above
requirement that the base CAS provides a reasonable way of directly
accessing *all* FS in the CAS, without resorting to views.  Getting
iterators off the base CAS is a way of accessing all of the CAS
content without regard to what view that content is in.  There is no
such thing as "the base view" with its own exclusive set of indexed
FeatureStructures.  The base CAS is not a view (it will have a
different interface, after all).

An alternative design would be to have no indexes accessible from the
base CAS.  Instead you could create FS on the base CAS and you could
get an iterator over all FS in the CAS in no particular order and with
no filtering.  But this isn't what I was thinking was "reasonable".
It seems overly limited without a good reason.  A use case might be
that I want all Person annotations in the CAS, regardless of what view
they are in, and I will just look at their covered text and do
something with it.  Can I get from the base CAS an iterator that
returns me just Person annotations?

> > * Change CAS.getView(...) APIs to return the new type CasView.
> > CasView will have Sofa access methods and indexing methods but not
> > FS-creation methods.  (Except, maybe createAnnotation methods - see
> > next point.)
> Good point - drives home the idea that the FS creation is done always in
> the
> base CAS.

I think we have a fair amount agreement on that now  -- at least
that's something :)

I think it then seems logical that if you create FS on the base CAS,
you can also access those FS also from the base CAS?

> > * A CasView is a way of accessing a subset of data in the CAS.  To
> > accomplish this a CasView has its own index repository.  A CasView may
> > also have a Sofa -- if it does this means that annotations in its
> > index repository must refer to that Sofa.
> This is good, if we're tying Sofas to CasViews.  But I think this is not
> a necessary tie.
> (e.g. you could have multiple sofas associated with one CasView, or
> multiple CasViews associated with one Sofa).

... or a CasView associated with no Sofa at all.  I agree.  I should
have capitalized MAY in "A CasView MAY also have a Sofa."  This is
consistent with the architecture spec.  But actually implementing
Sofa-less or multi-Sofa views isn't high on my priority list.

> > * As I think about this now, it occurs to me that we should have a
> > method CAS.createAnnotation(int begin, int end, SofaFS sofa) to allow
> > annotations to be created off the CAS (consistent with the idea that
> > all analysis data can be created and accessed from the CAS).  But we
> > might also want CasView.createAnnotation(int begin, int end) as a
> > convenience.
> Both createAnnotation methods are off of instances of CAS or CasView  -
> they're not static, right?  In other words:
> aCas.createAnnotation(...) or
> aCasView.createAnnotation(...)?

They're not static.

CAS.createAnnotation(int begin, int end, SofaFS sofa) is defined on
the CAS interface.  You need to specify which SofaFS you want the
annotation to point to.

CasView.createAnnotation(int begin, int end), where you don't have to
specify a SofaFS, is defined on the CasView interface.  This is the a
convenience method for creating annotations that point to the Sofa for
a particular view you're working on.  (It would only work for
single-Sofa ["anchored"] views).

> Might want to treat this convenience function in the context of backwards
> compatibility.

CasView would be a completely new interface so for that we don't have
to worry.  The backwards compatibility question is what to do with
CAS.createAnnotation(int begin, int end).  That's covered on the other
mail thread, the one devoted to backwards compatibility.


View raw message