uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <wyldf...@users.sourceforge.net>
Subject Re: How to properly update offsets of an annotation?
Date Thu, 09 Oct 2014 16:18:31 GMT
Test case here:

https://issues.apache.org/jira/browse/UIMA-4049

Cheers,

-- Richard

On 19.08.2014, at 23:18, Marshall Schor <msa@schor.com> wrote:

> easy test case?
> 
> Then I'll take a look :-) -Marshall
> On 8/13/2014 10:44 AM, Richard Eckart de Castilho wrote:
>> On 13.08.2014, at 14:49, Marshall Schor <msa@schor.com> wrote:
>> 
>>> hi,
>>> 
>>> some things to check: 
>>> 
>>> When you say the tokens "remain in the CAS", I think you mean the tokens remain
>>> in one or more indexes, because,
>>> of course, the removeFromIndexes doesn't remove things from the CAS.
>> Sure. 
>> 
>>> The behavior of the removeFromIndexes depends on the kinds of indexes you have
>>> defined;  if you have a bag or sorted index (that is, not a "set" index), then
>>> it is quite possible to "add-to-indexes" the same feature structure multiple
>>> times.  If this has happened, and then you do just one "remove-from-index", the
>>> other indexing would still be in the index.
>> No custom indexes are defined - so we're talking only about the default index
>> over the Annotation type.
>> 
>>> What kinds of indexes do you have defined, here, and what index is being
>>> selected to use in the
>>> 
>>> "for (def token : ....)"
>>> 
>>> syntax?
>> The annotation index is used here: cas.getAnnotationIndex(type)
>> 
>> Mind that the only difference between the tests I did was the text and
>> consequently the number of tokens and different offsets. The rest of the
>> setup (type system, indexes, etc) was all the same.
>> 
>> I'm confused...
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>>> On 8/13/2014 5:33 AM, Richard Eckart de Castilho wrote:
>>>> Hi all,
>>>> 
>>>> I am facing a very odd situation with the following type of (pseudo-)code:
>>>> 
>>>> def previousToken;
>>>> def toDelete[];
>>>> for (def token : select(jcas, Token)) {
>>>> if (previousToken && isName(previousToken, token) {
>>>>   token.setBegin(previousToken.getBegin());
>>>>   toDelete.add(previousToken);
>>>> }
>>>> previousToken = token;
>>>> }
>>>> 
>>>> for (def token : toDelete) {
>>>> token.removeFromIndexes();
>>>> }
>>>> 
>>>> Depending on the text in the CAS, sometimes I get
>>>> the effect that the tokens in toDelete actually remain
>>>> in the CAS.
>>>> 
>>>> I tried a different approach in which I also record the
>>>> tokens with the updated start index and then do a
>>>> 
>>>> for (def token : toReindex) {
>>>> token.removeFromIndexes();
>>>> token.addToIndexes();
>>>> }
>>>> 
>>>> That seems to flip around the situation. If a token was
>>>> previously correctly removed, it now remains, and if a
>>>> token was not removed, it is removed now.
>>>> 
>>>> I would like to avoid having to create a new token annotation
>>>> with new offsets and then delete both the old annotations.
>>>> 
>>>> If need be, I can probably set up a minimal test case, but
>>>> before that, maybe somebody could give me a clue...
>>>> 
>>>> Cheers!
>>>> 
>>>> -- Richard


Mime
View raw message