uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Resolved] (UIMA-4357) create auxiliary flattened version of index and its subtypes, automatically managed
Date Tue, 05 May 2015 03:42:06 GMT

     [ https://issues.apache.org/jira/browse/UIMA-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Marshall Schor resolved UIMA-4357.
    Resolution: Fixed

> create auxiliary flattened version of index and its subtypes, automatically managed
> -----------------------------------------------------------------------------------
>                 Key: UIMA-4357
>                 URL: https://issues.apache.org/jira/browse/UIMA-4357
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 2.7.1SDK
> UIMA indexes allow retrieving items from the CAS, trading off space (for indexes) for
time (speed of finding items in the CAS, speed of iterating).  For sorted indexes over a type
with subtypes, if the index isn't being modified, it is possible to do a one-time extraction
in sorted order of the items and save this in an array, and iterate much more rapidly over
that. I've seen lots of cases of UIMA flows where some annotators will create and index a
type (and its subtypes), and once that's been done, the indexes are not subsequently updated
for these types, but downstream annotators iterate over them.  It seems that a lazy creation
for this kind of flattened index would work well in many cases.
> It is important, I think, to continue to preserve the same kind of ConcurrentModificationException
detection.  To make this additional index space-time trade-off automatic and reasonable, make
the additional index reachable via a SoftReference, to allow the GC to reclaim the space if
> Delay the creation of a flattened version until there's evidence that it will be unmodified
for some time.  To count things that motivate its creation, count the number of times an iterator
over an index is using the code "heapifyUp/Down" that manages the ordering of the subiterators
to preserve sort order.  A basic indicator may be the number of times that occurs, without
an intervening update to the indexes, relative to the size of the index.
> The flattened array can save a bit more time by holding references to the Java cover
class (JCas or non-JCas) for this object. 
> Cas Reset needs to clear out these things.

This message was sent by Atlassian JIRA

View raw message