uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Unique IDs for Feature Structure instances - 3 observations
Date Tue, 05 Mar 2013 17:17:36 GMT
Some of this has been previously stated.  I'm summarizing :-)

It seems these would be nice to have at runtime, not just externally.

Assigning them at runtime has potential issues for "parallel" processing of
CASes.  Parallelism can arise in UIMA-AS scheduling using the flow controller
parallel-step option. 

This can also arise in a simple application associated with a CAS Store, where
the operation is to deserialize an existing CAS, add FSs to it, and reserialize
the result back to the store *under the same CAS id*.

The parallel use case here is that many of these operations could occur
simultaneously. Of course, the reserializing would need to take account of the
"high-water-mark" - just as is done for the flow-controller parallel-step
option.  In that case, we also declare it is "illegal" for annotators to update
feature structures "below the high-water-mark", because if two annotators
updated the same slot, then the later one would "win", and the previous update
would be "lost".

Running in parallel means it may be hard to assign at FS creation time the
"next" available unique FS id - so that's a problem to address.


Another (potential) problem: if the FS id is added, this represents potentially
a significant increase in the CAS size.  For some applications, this could be an
issue.  So I hope the architecture allows modes of operation where there is no
space taken in the CAS for this.  Something like this may be needed also for
backwards compatibility.


It may be that many FSs in the CAS won't need a unique FSid.  An example: UIMA
supports lists made out of Lisp-like "cons" cells - the FSList structure has 2
slots - one is a reference (or nil) to the next cons object, the other is a
reference to the item in the list at that spot.  I've seen applications that
have 1000's or more of these cons cells.  They are never individually "indexed"
(except perhaps occasionally the "head" of the list), but just serve to create
the list.

I wonder if an architecture for unique FSids could account for this, and not
have any overhead for some FeatureStructures which won't need a unique FSid.


View raw message