uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2434) Feature structure removal from sorted index is very slow
Date Mon, 06 May 2013 13:32:15 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649746#comment-13649746

Marshall Schor commented on UIMA-2434:

yes, that's right.

We currently only have serialization -> deserialization, or cas copying to
accomplish reclaiming space - it's like a stop-and-copy garbage collection.
> Feature structure removal from sorted index is very slow
> --------------------------------------------------------
>                 Key: UIMA-2434
>                 URL: https://issues.apache.org/jira/browse/UIMA-2434
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.3.1SDK
>            Reporter: Mikhail Sogrin
>            Assignee: Marshall Schor
>             Fix For: 2.4.1SDK
> Removal of feature structures from sorted indexes (e.g. default index) is very slow.
FSIntArrayIndex.remove() method performs two operations: linear search in the array until
the given FS is found, followed by the shift of elements to the end of this array by one position
to the left.
> If many annotations (millions and more) are being deleted at once, this operation gets
very very slow - much slower than adding these annotations in the first place. It seems to
require O(N^2) time to remove N annotations.
> One item is the linear search, which can be replaced by the binary search method, which
is already implemented in the same class.
> Second, array copy can be done with Java built-in method instead of a custom loop.
> Ideally, a method for bulk removal of a collection of annotations would have been the
most efficient, for example a method to remove all annotations of a given type.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message