uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: CAS and CasView redesign - question if all views should share thesame indexes?
Date Wed, 20 Dec 2006 15:16:38 GMT
On 12/19/06, Marshall Schor <msa@schor.com> wrote:
> If we think of a CasView as a way of accessing a subset of the data
> in the CAS, what are the pluses and minuses of having every view
> have the same (shared) index definitions?  Would it make more sense
> to have each view have its own non-shared set of indexes / definitions?

Maybe... we might extend the index descriptor format to allow
specifying a set of view names to which the index applies.  And in the
absence of such a specification, the index might apply only to view of
the component's declared input and output sofas.  For "sofa-unaware"
annotators (or whatever we're calling them this week ;)  this would
mean that the index only applies to the one view that they operate on
(which is specified by sofa mappings).  Although I'm concerned what
happens if sofa mapping becomes dynamic.

All in all, without a concrete use case where there is currently a
significant performance issue, I would put off adding this feature.

> But some components need specific indexes (and type priorities :-)
> in order to correctly iterate through sets of FSs.  In this case, the
> component part is closely associated with the index specification.
> For better modularity - if I had a component operating on a particular
> view, needing a particular index specification, these might be
> associated to the component - and having such an index as a "global"
> thing might lead to unwanted "collisions" in the index "name-space",
> although this could be minimized by having some uniqueness to the
> index name.  So if I called the indexed "ComponentAsIndex", it would
> make more sense if this was associated only with Component A, and not
> globally.  This doesn't quite match associated the index with just one
> view, I admit.

Component-specific index also seem like a good idea (to do someday).
One reason is to allow an optimization for remote annotators.  There's
no reason to actually build the index on the client side if it's only
needed by a remote annotator, if the index isn't serialized to the
remote node.  We need only keep a list of indexed FS, and build the
index on the remote node as we do already.

Also we can deal with name collisions - two annotators could declare
different indexes with the same label, but since they are specific to
the component that is OK.  When each component executes
IndexRepository.getIndex(label), it would get the index that it itself
had declared.  This could be implemented the same way we are currently
handling Sofa mapping - the CAS "knows" what annotator is currently
processing it.  Of course if two annotators declared indexes over the
same type (or where one type is an ancestor of the other) with the
same sort keys, they should be merged into one index in the
implementation, even if they have different labels.


View raw message