uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] [Created] (UIMA-4135) support for modifying indexed FSs
Date Tue, 02 Dec 2014 19:47:20 GMT

On 12/2/2014 11:51 AM, Richard Eckart de Castilho wrote:
> On 02.12.2014, at 17:14, Marshall Schor <msa@schor.com> wrote:
>
>> A subsequent discussion with Burn L. produced the following two good ideas:
>>
>> 1) The UIMA framework could automatically do the safe thing on each feature
>> modification that required it.  Although this might seem inefficient, it is
>> likely that in most cases, only one feature (used as a key in some index spec)
>> is being modified at any one time.  For those cases where this isn't true, the
>> alternative of a index protection block encapsulating multiple updates could be
>> used; but it's likely that would rarely be needed.
>>
>> The automatic approach would, in effect, do a remove, modify, add-back cycle for
>> each feature modification, in all indices where the FS was in the index, if the
>> feature was used as a key.
>>
>> This would be a boon to users - as their code would now work without the danger
>> of accidentally corrupting indices.
> Sounds good :)
>
> So by default, the CAS would protect itself. When a protection block (I cannot
> help thinking of this as a kind of transaction) is used, then the protection
> would be temporarily disabled and the modifications would be written to some
> kind of "transaction log". When the block is closed, the log is "committed",
> basically removing/readding all the modified FSes. Did I paraphrase this correctly?

Assuming we have "normally" the "automatic" style in effect, then yes, inside
the protection block the automatic style would be temporarily disabled.  The
"removes" would still be done, but the info needed to do the addbacks would be
kept.  So, at the point of a feature update, the remove (only if needed, of
course) would be done, and the update, but not the "add-back".  (Remember, that
doing the update before the remove causes index corruption.)  For the second and
subsequent update to (another) feature of that FS, no index operations would be
done (it would already be "removed").  And then at the end, only the re-adding
of whatever was removed would be done.

>
> A flow controller or the component base classes could forcibly put the CAS back
> into protection mode in case that the component coder forgot it (and log a warning) -
> or it could even throw an exception in such a case.

This would be a partial solution, because there are cases where there would not
be a flow controller involved, or even a base class.  A complete solution is to
have the API follow the style of using an inner class as discussed in previous
notes in this change. 

This "bulk" mode, though, I think would be the exception, because most users set
lots of features when a new FS is created, but then the "typical" mode is to
update just a few, (I'm guessing :-) ).  If this is true, then the "automatic"
mode (discussed at the top) would work, and the "bulk" mode would be relegated
to just an optimization for a less-usual case.  So, most people would not need
to do anything, and their code would start working without corrupting indices.

-Marshall
>
>> 2) Because this would turn a feature update into (potentially) a remove - update
>> - add operation, users writing feature updates inside an interator would be
>> exposed to suddenly getting illegal index modification while iterating exceptions.
>>
>> This has long been an issue, I think, causing users to write loops that extract
>> FSs into array lists and then iterate over those, while doing UIMA index adds/
>> removes.
> Totally :)
>
>> How about we add a method to our iterator creation suite, perhaps named
>> safeIterator(), which creates a snapshot of the index its iterating over at the
>> start, and then allows the user code to do arbitrary index adds/removes?  
> Sounds good as well. I think that some UIMA core iterator already copies FSes to
> some collection before returning it. Some of the uimaFIT select*() methods certainly
> do this (but not all - and it is not advertised to users).
>
>> It seems this occurs frequently enough to warrant UIMA built-in support, and some
>> optimizations may be available. It seems it could be especially helpful if (1)
>> were implemented, because the remove/add could occur unbeknownst to the user. 
>> For example, the component writer may not have had a feature in any index, but
>> when his component was combined with others, an index could have been added that
>> used the feature.
>>
>> WDYT?
> It is probably not a common problem, but from the perspective of the architecture,
> it would be good to avoid negative side-effects from a third component adding an
> index that could cause undesired or even wrong behavior.
>
> Cheers,
>
> -- Richard
>


Mime
View raw message