uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-4049) The curious case of the zombie annotation
Date Sat, 11 Oct 2014 15:13:34 GMT

    [ https://issues.apache.org/jira/browse/UIMA-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168198#comment-14168198
] 

Marshall Schor commented on UIMA-4049:
--------------------------------------

The test case is doing an invalid operation; when that is correct, it works fine.

The invalid operation is that 
  a) An annotation is made and added to the indexes.  This operation uses (reads) the values
of the begin / end features
  b) The "begin" value of that annotation is changed.

       When this is done, the annotation must be re-indexed, because it's no longer in the
"right" position in the index (potentially).  This is what is missing.  To reindex, you remove
and then add back the modified index.

Here's the corrected bit of code:
{code}
      // Merge some tokens, in particular "Franz" "III." -> "Franz III." and "Friedrich"
"II."
      // into "Friedrich II."
      AnnotationFS previous = null;
      List<AnnotationFS> toDelete = new ArrayList();
      List<AnnotationFS> toReindex = new ArrayList();
      for (AnnotationFS token : jcas.getAnnotationIndex(tokenType)) {
          if (previous != null && names.contains(previous.getCoveredText())
                  && suffix.contains(token.getCoveredText())) {
              token.setIntValue(beginFeature, previous.getBegin());
              toReindex.add(token);
              toDelete.add(previous);
          }
          previous = token;
      }
      
      for (AnnotationFS token : toReindex) {
        jcas.removeFsFromIndexes(token);
        jcas.addFsToIndexes(token);
      }
{code}

With this change, the test works fine :-)

> The curious case of the zombie annotation
> -----------------------------------------
>
>                 Key: UIMA-4049
>                 URL: https://issues.apache.org/jira/browse/UIMA-4049
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Richard Eckart de Castilho
>            Assignee: Marshall Schor
>
> When annotations are removed from indexes, sometimes they come back... the following
test case shows how an annotation is removed but still present when iterating over the index
later.
> {code}
>     @Test
>     public void testForZombies() throws Exception
>     {
>         // No zombie here
>         int[] offsets1 = { 0, 4, 5, 11, 12, 21, 22, 25, 26, 29, 30, 35, 36, 40, 41, 50,
51, 60, 61,
>                 64, 64, 65 };
>         testForZombies("Dies flößte Friedrich II. für seine neue Eroberung Besorgnis
ein.", offsets1);
>         
>         // Zombie hiding in here
>         int[] offsets2 = { 0, 3, 4, 7, 8, 13, 14, 18, 19, 22, 23, 33, 34, 35 };
>         testForZombies("Ich bin Franz III. von Hammerfels !", offsets2);
>     }
>     public void testForZombies(String aText, int[] aOffsets) throws Exception
>     {
>         // Init some dictionaries we ues
>         Set<String> names = new HashSet<String>();
>         names.add("Friedrich");
>         names.add("Franz");
>         Set<String> suffix = new HashSet<String>();
>         suffix.add("II.");
>         suffix.add("III.");
>         // Set up type system
>         TypeSystemDescription tsd = new TypeSystemDescription_impl();
>         tsd.addType("Token", "", CAS.TYPE_NAME_ANNOTATION);
>         
>         // Create CAS
>         CAS jcas = CasCreationUtils.createCas(tsd, null, null);
>         jcas.setDocumentText(aText);
>         
>         Type tokenType = jcas.getTypeSystem().getType("Token");
>         Feature beginFeature = tokenType.getFeatureByBaseName("begin");
>         
>         // Create tokens in CAS
>         for (int i = 0; i < aOffsets.length; i += 2) {
>             jcas.addFsToIndexes(jcas.createAnnotation(tokenType, aOffsets[i], aOffsets[i+1]));
>         }
>         
>         // List the tokens in the CAS
>         for (AnnotationFS token : jcas.getAnnotationIndex(tokenType)) {
>             System.out.printf("Starting with %s%n", token.getCoveredText());
>         }
>         // Merge some tokens, in particular "Franz" "III." -> "Franz III." and "Friedrich"
"II."
>         // into "Friedrich II."
>         AnnotationFS previous = null;
>         List<AnnotationFS> toDelete = new ArrayList<>();
>         for (AnnotationFS token : jcas.getAnnotationIndex(tokenType)) {
>             if (previous != null && names.contains(previous.getCoveredText())
>                     && suffix.contains(token.getCoveredText())) {
>                 token.setIntValue(beginFeature, previous.getBegin());
>                 toDelete.add(previous);
>             }
>             previous = token;
>         }
>         // Remove the no longer necessary tokens ("Friedrich" and "Franz"), since we
expanded the
>         // following tokens "III." and "II." to include their text
>         Set<String> removedWords = new HashSet<String>();
>         for (AnnotationFS token : toDelete) {
>             System.out.printf("Removing %s%n", token.getCoveredText());
>             removedWords.add(token.getCoveredText());
>             jcas.removeFsFromIndexes(token);
>         }
>         // Check if the tokens that we wanted to remove are really gone
>         for (AnnotationFS token : jcas.getAnnotationIndex(tokenType)) {
>             System.out.printf("Remaining %s%n", token.getCoveredText());
>             if (removedWords.contains(token.getCoveredText())) {
>                org.junit.Assert.fail("I saw a zombie!!!");
>             }
>         }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message