uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Created] (UIMA-4111) Change how default bag indices are created
Date Wed, 19 Nov 2014 15:26:40 GMT
Marshall Schor created UIMA-4111:

             Summary: Change how default bag indices are created
                 Key: UIMA-4111
                 URL: https://issues.apache.org/jira/browse/UIMA-4111
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
            Reporter: Marshall Schor
            Assignee: Marshall Schor
             Fix For: 2.7.0SDK

UIMA-173 added the concept of a universal default bag index for types that would be created
if no other index was defined for that type.  That Jira has a link to the motivation, where
it is clear that this was intended to simplify how UIMA works and allow all feature structures
that were addedToIndexes() to be retrieved. 

UIMA-297 corrected some anomalies in the original implementation.

This Jira is to correct the edge cases that happen when there are only Set indices defined
for a type.  Because of the behavior of Set indices which
do not add to their index the 2nd or subsequent FSs whose key values match the comparator
definition for the Set, the original motivation of the default bag index is thwarted in this
case.  This has caused several edge case issues; a special note about this surprising behavior
had to be included in the UIMA documentation, etc. 

More recently, another edge case has been discovered, when an annotator contained in an aggregate
having sufficient index definitions to insure a non-set index for type T is remoted, and that
remote service has only a Set index for type T.  Assume that the client has added-to-indices
100 instances of type T, the CAS is serialized to the remote, the remote deserializes the
CAS and does 100 add-to-indices, of which perhaps 50 succeed, and the other 50 are no-ops
(due to the Set equivalance).  Now when the remote CAS is returned, only 50 will appear in
the index back at the client.  This goes against the principle in UIMA where we try and have
remoting of components not affect the semantics, where possible.  This is also quite a surprising
effect, which won't be expected by most users.  This is also an "unstable" effect, in that,
if a pipeline "assembler" (knowing little about the "internals" of the components) were to
add a component to the remote which included a non-set index for type T, it would start behaving
differently, not losing any indexed items. 

The converse would also be true: If the remote had no indices defined for type T, then add-to-indices
for type T would be recorded in lazily created default bag indices, and those events would
be sent back to the client. If an assembler were to now add a component which contained only
a set definition for type T, this behavior would suddenly start dropping FSs that were excluded
due to the Set comparator. 

For all these reasons (discovered in discussions with Edward Epstein and Adam Lally), and
because of the original intent of this default bag index (discovered by reading the mail archives
pointed to by the above two Jiras which describe in some detail the motivations for this),
this Jira changes the logic of when the default bag index is created to create it whenever
the situation is that some add-to-indices event would not record an addition (e.g., if there
were no indices, or only Set indices, and the FS matched elements already in the Sets.).

This change will affect documentation, so update that too.  In particular, the NOTE in this
section http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.reading_results_previous_annotators
will no longer apply.

The behavior of getAllIndexedFS(type) will change - it will no longer have an exception for
the special case where only Set indices were defined for the type.

Because it seems that it is extremely unlikely that the previous behavior was being depended
upon, there is no global flag to restore the previous behavior.

This message was sent by Atlassian JIRA

View raw message