uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: UIMA RUTA - Custom BLOCK extension
Date Thu, 17 Dec 2015 21:45:11 GMT
Yes, it was a bug. It's fixed now and here's the test:
https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/test/java/org/apache/uima/ruta/BlockTest.java

Thanks for pointing it out :-)

Best,

Peter

Am 17.12.2015 um 21:47 schrieb Peter Klügl:
> Hi,
>
> the first one looks like a bug. The block should behave as expected. 
> I'll take a look at it.
>
> About the array-block-extension: Yes, send me the code.
>
> I am already working towards explicit references to annotations (right 
> now you reference them with type expressions) and support of Arrays.
> Then, your use case will directly supported without any extension like
>
> EntityRole.featureName->{LOG(Document.ct);};
>
> or
>
> a:EntityRole.featureName{-> LOG(a.ct)};
>
> Best,
>
> Peter
>
>
>
> Am 16.12.2015 um 02:02 schrieb Miguel Alvarez:
>> Thanks again!
>>
>> I noticed some strange behaviour with the BLOCK statement while creating
>> this new extension, which I am not sure it is correct. I thought 
>> these two
>> statements should be equivalent:
>>
>> EntityRole{EntityRole.relationId == "5" -> LOG(EntityRole.relationId),
>> LOG(EntityRole.ct)};
>>
>> BLOCK(MyTest) EntityRole{EntityRole.relationId == "5"} {
>>      LOG(EntityRole.relationId);
>>      LOG(EntityRole.ct);
>> }
>>
>> But it turns out that there are cases in which these two statements 
>> are not
>> equivalent, and the second one shows some strange behaviour.
>>
>> If you have a string that has been annotated multiple times (let's 
>> say five
>> times) with the same annotation type (in this case EntityRole) but 
>> for each
>> of those annotations the feature relationId has different values (values
>> from 1 to 5). In this case the first statement will log two lines, as 
>> I was
>> expecting, one with the relationId value of "5" and the other with the
>> covered text of the annotation.
>>
>> But in the second statement (the BLOCK) the script will end up 
>> logging ten
>> lines. The first five will contain the value of "5" for the 
>> relationId, and
>> the last 5 will contain the covered text for the annotation.
>>
>> Had you come across this before? Why is it doing it five times even 
>> though
>> there is only one annotation where the relationId feature is equal to 
>> 5? And
>> the interesting part is the order in which logs the information: 
>> first it
>> logs the relationId feature value 5 times, and then it logs the 
>> covered text
>> five times.
>>
>> Any ideas?
>>
>> Talking about developing new extensions. The other extension I tried to
>> develop but then I got stuck at some point was another block that would
>> iterate over the annotations contained in an feature array. But I am not
>> sure how applicable this can be to the majority of users. For instance,
>> let's say we have this:
>>
>> ARRAYBLOCK(featureName) EntityRole {
>>     Document{->LOG(Document.ct)};
>> }
>>
>> In this case the annotation EntityRole will have a feature named 
>> featureName
>> that contains an array of annotations, and the block will loop 
>> through those
>> annotations in the array changing also the scope to those 
>> annotations. The
>> only way I could find of specifying the feature that contains the 
>> array was
>> using the block id, but then I keep on getting a warning saying that the
>> type hasn't been defined
>>
>> The problem I have is that it doesn't seem to be applying the elements
>> within the block. If you think this one can be interesting I can send 
>> the
>> code I have so far.
>>
>> Cheers,
>> Miguel
>>
>> -----Original Message-----
>> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
>> Sent: December 14, 2015 9:51
>> To: dev@uima.apache.org
>> Subject: Re: UIMA RUTA - Custom BLOCK extension
>>
>> Hi,
>>
>> Am 14.12.2015 um 18:20 schrieb Miguel Alvarez:
>>> Thanks Peter! That is exactly what I was looking for. I think my code
>>> wasn't working because of the way I was invoking the constructor,
>>> which I didn't include in my previous emails. I assume since you have
>>> included this in the code already, I don't need to do anything to
>> contribute it, right?
>>
>> Yes... but you are welcome to come up with other great ideas ;-)
>>
>> I will take of the documentation. I also think about adding the first
>> (wrong) variant.
>>
>> Does anyone have ideas about the naming? If not it will remain 
>> DOCUMENTBLOCK
>> or something similar.
>>
>>
>> Best,
>>
>> Peter
>>
>>> I hope this new block is useful to many other people.
>>>
>>> Thanks,
>>> Miguel
>>>
>>> -----Original Message-----
>>> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
>>> Sent: December 14, 2015 4:37
>>> To: dev@uima.apache.org
>>> Subject: Re: UIMA RUTA - Custom BLOCK extension
>>>
>>> Hi,
>>>
>>> sorry, I misinterpreted your use case.
>>>
>>> Yes, you are completely right and your code looks correct.
>>>
>>> If getList() does not return the matches, then either the rule 
>>> wasn't able
>>> to find any anchors at all to start matching, or the apply was 
>>> called with
>>> false meaning the matches are not stored for performance reasons. You
>> should
>>> be able to just delegate to the RutaScriptBlock with a resetted
>> RutaStream:
>>> @Override
>>>     public ScriptApply apply(RutaStream stream, InferenceCrowd crowd) {
>>>       CAS cas = stream.getCas();
>>>       AnnotationFS documentAnnotation = cas.getDocumentAnnotation();
>>>       RutaStream completeStream =
>>> stream.getWindowStream(documentAnnotation, 
>>> documentAnnotation.getType());
>>>       ScriptApply result = super.apply(completeStream, crowd);
>>>       return result;
>>>     }
>>>
>>> I added this to the current trunk:
>>> block impl:
>>>
>> https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core-ext/src/main/java 
>>
>>> /org/apache/uima/ruta/block/DocumentBlock.java
>>> unit test:
>>>
>> https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core-ext/src/test/java 
>>
>>> /org/apache/uima/ruta/block/DocumentBlockTest.java
>>>
>>>
>>> Does this work for you?
>>>
>>> Best,
>>>
>>> Peter
>>>
>>>
>>> Am 14.12.2015 um 04:14 schrieb Miguel Alvarez:
>>>> Hi Peter,
>>>>
>>>>
>>>> Thanks for your prompt reply.
>>>>
>>>>
>>>> Let me know if I am wrong, but I don’t think the code you sent would
>>>> work in case of having the custom BLOCK extension nested inside
>>>> another block. For instance let’s say we have these annotations in 
>>>> some
>>> text:
>>>>
>>>> Annotation1 Annotation2 Annotation3 Annotation2 Annotation4
>>>> Annotation2
>>>>
>>>>
>>>> BLOCK Annotation3{} {
>>>>
>>>>       // Extract some information from Annotation3’s features and 
>>>> store
>>>> them in variables
>>>>
>>>>       DOCUMENTBLOCK Annotation2{} {
>>>>
>>>>           // Use the information extracted from Annotation3 to 
>>>> determine
>>>> if this particular Annotation2 is the one I want
>>>>
>>>>       }
>>>>
>>>> }
>>>>
>>>>
>>>> And I actually want the custom BLOCK extension to have the right
>>>> context when within the BLOCK. So I want the DOCUMENTBLOCK extension
>>>> to look for
>>>> Annotation2 in the whole document, but once you are inside the
>>>> DOCUMENTBLOCK
>>>> Annotation2 should be the new scope (as the current BLOCK statement
>>>> does right now).
>>>>
>>>>
>>>> So initially this is the code I tried:
>>>>
>>>>
>>>>      public ScriptApply apply(RutaStream stream, InferenceCrowd 
>>>> crowd) {
>>>>        // Create a new stream of the whole document
>>>>        RutaStream docStream =
>>>> stream.getWindowStream(stream.getCas().getDocumentAnnotation(),
>>>> stream.getCas().getDocumentAnnotation().getType());
>>>>        BlockApply result = new BlockApply(this);
>>>>        crowd.beginVisit(this, result);
>>>>        RuleApply apply = rule.apply(docStream, crowd, true);
>>>>        for (AbstractRuleMatch<? extends AbstractRule> eachMatch :
>>>> apply.getList()) {
>>>>          if (eachMatch.matched()) {
>>>>            List<AnnotationFS> matchedAnnotations = ((RuleMatch)
>>>> eachMatch).getMatchedAnnotations(null, null);
>>>>            if (matchedAnnotations == null || 
>>>> matchedAnnotations.isEmpty())
>> {
>>>>              continue;
>>>>            }
>>>>            AnnotationFS each = matchedAnnotations.get(0);
>>>>            if (each == null) {
>>>>              continue;
>>>>            }
>>>>               List<Type> types = ((RutaRuleElement)
>>>> rule.getRuleElements().get(0)).getMatcher().getTypes(getParent() == 
>>>> null
>> ?
>>>> this : getParent(), docStream);
>>>>            for (Type eachType : types) {
>>>>              RutaStream window = docStream.getWindowStream(each,
>> eachType);
>>>>              for (RutaStatement element : getElements()) {
>>>>                if (element != null) {
>>>>                  element.apply(window, crowd);
>>>>                 }
>>>>               }
>>>>              }
>>>>            }
>>>>          }
>>>>        crowd.endVisit(this, result);
>>>>        return result;
>>>>      }
>>>>
>>>>
>>>> I thought I would just get a new stream that covers the whole
>>>> document, and apply the rules to that but the call “apply.getList()”
>>>> would never return anything even though I don’t have any conditions in
>>>> the RUTA script for the DOCUMENTBLOCK extension. And that is why I
>>>> ended up calling the method getAllOfType, because that one was working
>>>> fine, but of course, it doesn’t apply the conditions.
>>>>
>>>>
>>>> Any ideas why the “getList” wouldn’t return anything even though I
am
>>>> passing a new stream that covers the whole document?
>>>>
>>>>
>>>> If I get this to work, I have no problems contributing it to the UIMA
>>>> RUTA project.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Miguel
>>>>
>>>>
>>>>
>>>> From: Peter Klügl <
>>>> <http://gmane.org/get-address.php?address=peter.kluegl%2deqSzvFVgjydBD
>>>> gjK7y7 TUQ%40public.gmane.org> peter.kluegl@...>
>>>> Subject:
>>>> <http://news.gmane.org/find-root.php?message_id=566D5BCE.7070503%40ave
>>>> rbis.c
>>>> om> Re: UIMA RUTA - Custom BLOCK extension
>>>> Newsgroups: <http://news.gmane.org/gmane.comp.apache.uima.devel>
>>>> gmane.comp.apache.uima.devel
>>>> Date: 2015-12-13 11:51:42 GMT (14 hours and 50 minutes ago)
>>>>
>>>> Hi,
>>>>    oh yes, this is a nice extension. I was also already planning to 
>>>> add
>>>> something like this, but in my use cases the explicit referencing to
>>>> each matched annotation in the gobal context was missing. Thus, I am
>>>> implementing the annotation issues first.
>>>>    It is possible to specify something like this right now in UIMA 
>>>> Ruta
>>>> but I would not recommend it. You could either spam/remove annotations
>>>> on the complete document or you could use the recursion functionality
>>>> of BLOCKs.
>>>>    Now to the custom block:
>>>>    You need to apply the head rule of the block in order to 
>>>> evaluate the
>>>> conditions. The scope is changed by the usage of a new restricted
>>>> RutaStream (windowStream). In order to retain the scope, just use the
>>>> given RutaStream.
>>>>    Without having tested it, it could look something like:
>>>>       <at> Override
>>>>      public ScriptApply apply(RutaStream stream, InferenceCrowd 
>>>> crowd) {
>>>>        BlockApply result = new BlockApply(this);
>>>>        crowd.beginVisit(this, result);
>>>>        RuleApply apply = rule.apply(stream, crowd, true);
>>>>        for (AbstractRuleMatch<? extends AbstractRule> eachMatch :
>>>> apply.getList()) {
>>>>          if (eachMatch.matched()) {
>>>>              for (RutaStatement element : getElements()) {
>>>>                if (element != null) {
>>>>                  element.apply(stream, crowd);
>>>>              }
>>>>            }
>>>>          }
>>>>        }
>>>>        crowd.endVisit(this, result);
>>>>        return result;
>>>>      }
>>>>    Let me know if this helps.
>>>>    Do you want to contribute the block extension?
>>>>    Best,
>>>>    Peter
>>>>    Am 12.12.2015 um 00:04 schrieb Miguel Alvarez:
>>>>> Hi,
>>>>>
>>>>>
>>>>> I am in the process of developing a custom BLOCK extension that
>>>>> instead of changing the scope of the block, it uses the scope of the
>>> whole Document.
>>>>> With this type of BLOCK one could loop through a series of
>>>>> annotations,
>>>> and
>>>>> for each of those annotations search in the whole document for
>>>>> something else. I guess my first questions is: Is it even possible to
>>>>> do something like this without creating a custom BLOCK extension?
>>>>>
>>>>>
>>>>> I got something to work, but it doesn't seem to apply the conditions
>>>>> for
>>>> the
>>>>> block. This is more or less the code I have so far:
>>>>>
>>>>>
>>>>>                  List<Type> types = ((RutaRuleElement)
>>>>> rule.getRuleElements().get(0)).getMatcher().getTypes(getParent() 
>>>>> == null
>>> ?
>>>>> this : getParent(), stream);
>>>>>
>>>>>                  for (Type eachType : types) {
>>>>>
>>>>>                         //System.out.println("each Type: " +
>>>>> eachType.getShortName());
>>>>>
>>>>>                         for(AnnotationFS each :
>>>> stream.getAllofType(eachType))
>>>>> {
>>>>>
>>>>>                       RutaStream window = 
>>>>> stream.getWindowStream(each,
>>>>> eachType);
>>>>>
>>>>>                       for (RutaStatement element : getElements()) {
>>>>>
>>>>>                         if (element != null) {
>>>>>
>>>>>                           element.apply(window, crowd);
>>>>>
>>>>>                         }
>>>>>
>>>>>                       }
>>>>>
>>>>>
>>>>>                         }
>>>>>
>>>>>                  }
>>>>>
>>>>>
>>>>> I assume in order to apply the conditions I would need something like
>>>> this:
>>>>>                  RuleApply apply = rule.apply(stream, crowd);
>>>>>
>>>>>
>>>>> But for some reason this doesn't work, because I guess the scope has
>>>> already
>>>>> been changed and it is not able to find any of the annotations in
>>>>> within
>>>> the
>>>>> scope.
>>>>>
>>>>>
>>>>> Does this make any sense? Is there a better way to do this?
>>>>>
>>>>>
>>>>> Any help would be much appreciated.
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Miguel
>>>>>
>>>>>
>>>>
>>>>
>>
>


Mime
View raw message