uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Reopened] (UIMA-2434) Feature structure removal from sorted index is very slow
Date Thu, 02 May 2013 16:30:16 GMT

     [ https://issues.apache.org/jira/browse/UIMA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marshall Schor reopened UIMA-2434:
----------------------------------


Two issues: test case broken, and need to re-evaluate strategy for remove-all.  There is a
trade-off to be made in the design.  The current trade-off: improves reliability by allowing
existing iterators to function (if you do a modification-resetting operation, like move-to-first)
after a remove-all.  But it does this at the expense of not letting Java recover space from
the indexes - the indexes are reset-in-place, without freeing any storage.  An alternative
impl would free the storage used by the index, but any existing iterators that were iterating
over that type would potentially not "see" the removal because they're holding on to reference
to the before-the-remove-all version of the index.

UIMA's design has these kinds of trade-offs in other places.  One way it's done is to have
two forms of the methods (e.g. in the CAS there are regular methods and "low-level" - use
at your own risk - methods).

So, perhaps we could do the same here, and have a 2nd variety of these methods, e.g.

removeAllIncludingSubtypesReclaimingSpace 

and adding a note that any iterator which might include FeatureStructures being removed needs
to be discarded (and recreated if needed).

Or perhaps this is over-design/over-thinking, and this variety isn't really of sufficient
interest?  Opinions?
                
> Feature structure removal from sorted index is very slow
> --------------------------------------------------------
>
>                 Key: UIMA-2434
>                 URL: https://issues.apache.org/jira/browse/UIMA-2434
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.3.1SDK
>            Reporter: Mikhail Sogrin
>            Assignee: Marshall Schor
>             Fix For: 2.4.1SDK
>
>
> Removal of feature structures from sorted indexes (e.g. default index) is very slow.
FSIntArrayIndex.remove() method performs two operations: linear search in the array until
the given FS is found, followed by the shift of elements to the end of this array by one position
to the left.
> If many annotations (millions and more) are being deleted at once, this operation gets
very very slow - much slower than adding these annotations in the first place. It seems to
require O(N^2) time to remove N annotations.
> One item is the linear search, which can be replaced by the binary search method, which
is already implemented in the same class.
> Second, array copy can be done with Java built-in method instead of a custom loop.
> Ideally, a method for bulk removal of a collection of annotations would have been the
most efficient, for example a method to remove all annotations of a given type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message