uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miguel Alvarez" <miguelal...@gmail.com>
Subject RE: UIMA RUTA - Custom BLOCK extension
Date Wed, 23 Dec 2015 19:32:27 GMT
Hi Peter,

Sorry I didn't get back to you earlier. Explicit references to annotations
would be awesome. Then the extension I was talking about doesn't make any
sense. I was just trying to do it via an extension because that is the only
way I was trying to extend RUTA, I haven't touched any of the core code yet.
Find below the code, but as I said before it wasn't really working (it
didn't apply the inner statements).

What do you mean by "right now you reference them with type expressions"?
Are you implying that it is already possible without explicit references?

Thanks again Peter!

Cheers,
Miguel

public class ArrayBlock extends RutaBlock {

	private String featureName = null;
	
  public ArrayBlock(String id, RutaBlock parent, String defaultNamespace) {
    super(parent, defaultNamespace, parent != null ? parent.getContext() :
null);
    featureName = id;
  }

  @Override
  public ScriptApply apply(RutaStream stream, InferenceCrowd crowd) {
    BlockApply result = new BlockApply(this);
    crowd.beginVisit(this, result);
    RuleApply apply = rule.apply(stream, crowd, true);
    for (AbstractRuleMatch<? extends AbstractRule> eachMatch :
apply.getList()) {
      if (eachMatch.matched()) {
        List<AnnotationFS> matchedAnnotations = ((RuleMatch)
eachMatch).getMatchedAnnotations(null,
                null);
        if (matchedAnnotations == null || matchedAnnotations.isEmpty()) {
          continue;
        }
        AnnotationFS each = matchedAnnotations.get(0);
        if (each == null) {
          continue;
        }
        // Use the block identifier to specify the feature that contains the
array of annotations
        Feature feat = each.getType().getFeatureByBaseName(featureName);
        if(feat == null || !feat.getRange().isArray() ||
feat.getDomain().isPrimitive()) {
        	//Throw an error
        	this.getContext().getLogger().log(Level.SEVERE,
String.format("Unable to find feature or the one specified is not an array
of annotations. Feature: %s", this.getName()));
        	return null;
        }
        Type eachType = feat.getDomain();
        ArrayFS fsArray = (ArrayFS)each.getFeatureValue(feat);
        for(FeatureStructure fs: fsArray.toArray()) {
        	AnnotationFS eachItem = (AnnotationFS)fs;
            RutaStream window = stream.getWindowStream(eachItem, eachType);
            for (RutaStatement element : getElements()) {
              if (element != null) {
                element.apply(window, crowd);
              }
            }
        }
        
      }
    }
    crowd.endVisit(this, result);
    return result;
  }
  


-----Original Message-----
From: Peter Klügl [mailto:peter.kluegl@averbis.com] 
Sent: December 17, 2015 12:47
To: dev@uima.apache.org
Subject: Re: UIMA RUTA - Custom BLOCK extension

Hi,

the first one looks like a bug. The block should behave as expected. 
I'll take a look at it.

About the array-block-extension: Yes, send me the code.

I am already working towards explicit references to annotations (right now
you reference them with type expressions) and support of Arrays.
Then, your use case will directly supported without any extension like

EntityRole.featureName->{LOG(Document.ct);};

or

a:EntityRole.featureName{-> LOG(a.ct)};

Best,

Peter



Am 16.12.2015 um 02:02 schrieb Miguel Alvarez:
> Thanks again!
>
> I noticed some strange behaviour with the BLOCK statement while 
> creating this new extension, which I am not sure it is correct. I 
> thought these two statements should be equivalent:
>
> EntityRole{EntityRole.relationId == "5" -> LOG(EntityRole.relationId), 
> LOG(EntityRole.ct)};
>
> BLOCK(MyTest) EntityRole{EntityRole.relationId == "5"} {
>      LOG(EntityRole.relationId);
>      LOG(EntityRole.ct);
> }
>
> But it turns out that there are cases in which these two statements 
> are not equivalent, and the second one shows some strange behaviour.
>
> If you have a string that has been annotated multiple times (let's say 
> five
> times) with the same annotation type (in this case EntityRole) but for 
> each of those annotations the feature relationId has different values 
> (values from 1 to 5). In this case the first statement will log two 
> lines, as I was expecting, one with the relationId value of "5" and 
> the other with the covered text of the annotation.
>
> But in the second statement (the BLOCK) the script will end up logging 
> ten lines. The first five will contain the value of "5" for the 
> relationId, and the last 5 will contain the covered text for the
annotation.
>
> Had you come across this before? Why is it doing it five times even 
> though there is only one annotation where the relationId feature is 
> equal to 5? And the interesting part is the order in which logs the 
> information: first it logs the relationId feature value 5 times, and 
> then it logs the covered text five times.
>
> Any ideas?
>
> Talking about developing new extensions. The other extension I tried 
> to develop but then I got stuck at some point was another block that 
> would iterate over the annotations contained in an feature array. But 
> I am not sure how applicable this can be to the majority of users. For 
> instance, let's say we have this:
>
> ARRAYBLOCK(featureName) EntityRole {
> 	Document{->LOG(Document.ct)};
> }
>
> In this case the annotation EntityRole will have a feature named 
> featureName that contains an array of annotations, and the block will 
> loop through those annotations in the array changing also the scope to 
> those annotations. The only way I could find of specifying the feature 
> that contains the array was using the block id, but then I keep on 
> getting a warning saying that the type hasn't been defined
>
> The problem I have is that it doesn't seem to be applying the elements 
> within the block. If you think this one can be interesting I can send 
> the code I have so far.
>
> Cheers,
> Miguel
>
> -----Original Message-----
> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
> Sent: December 14, 2015 9:51
> To: dev@uima.apache.org
> Subject: Re: UIMA RUTA - Custom BLOCK extension
>
> Hi,
>
> Am 14.12.2015 um 18:20 schrieb Miguel Alvarez:
>> Thanks Peter! That is exactly what I was looking for. I think my code 
>> wasn't working because of the way I was invoking the constructor, 
>> which I didn't include in my previous emails. I assume since you have 
>> included this in the code already, I don't need to do anything to
> contribute it, right?
>
> Yes... but you are welcome to come up with other great ideas ;-)
>
> I will take of the documentation. I also think about adding the first
> (wrong) variant.
>
> Does anyone have ideas about the naming? If not it will remain 
> DOCUMENTBLOCK or something similar.
>
>
> Best,
>
> Peter
>
>> I hope this new block is useful to many other people.
>>
>> Thanks,
>> Miguel
>>
>> -----Original Message-----
>> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
>> Sent: December 14, 2015 4:37
>> To: dev@uima.apache.org
>> Subject: Re: UIMA RUTA - Custom BLOCK extension
>>
>> Hi,
>>
>> sorry, I misinterpreted your use case.
>>
>> Yes, you are completely right and your code looks correct.
>>
>> If getList() does not return the matches, then either the rule wasn't 
>> able to find any anchors at all to start matching, or the apply was 
>> called with false meaning the matches are not stored for performance 
>> reasons. You
> should
>> be able to just delegate to the RutaScriptBlock with a resetted
> RutaStream:
>> @Override
>>     public ScriptApply apply(RutaStream stream, InferenceCrowd crowd) {
>>       CAS cas = stream.getCas();
>>       AnnotationFS documentAnnotation = cas.getDocumentAnnotation();
>>       RutaStream completeStream =
>> stream.getWindowStream(documentAnnotation, documentAnnotation.getType());
>>       ScriptApply result = super.apply(completeStream, crowd);
>>       return result;
>>     }
>>
>> I added this to the current trunk:
>> block impl:
>>
> https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core-ext/src/mai
> n/java
>> /org/apache/uima/ruta/block/DocumentBlock.java
>> unit test:
>>
> https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core-ext/src/tes
> t/java
>> /org/apache/uima/ruta/block/DocumentBlockTest.java
>>
>>
>> Does this work for you?
>>
>> Best,
>>
>> Peter
>>
>>
>> Am 14.12.2015 um 04:14 schrieb Miguel Alvarez:
>>> Hi Peter,
>>>
>>>    
>>>
>>> Thanks for your prompt reply.
>>>
>>>    
>>>
>>> Let me know if I am wrong, but I don’t think the code you sent would 
>>> work in case of having the custom BLOCK extension nested inside 
>>> another block. For instance let’s say we have these annotations in 
>>> some
>> text:
>>>    
>>>
>>> Annotation1 Annotation2 Annotation3 Annotation2 Annotation4
>>> Annotation2
>>>
>>>    
>>>
>>> BLOCK Annotation3{} {
>>>
>>>       // Extract some information from Annotation3’s features and 
>>> store them in variables
>>>
>>>       DOCUMENTBLOCK Annotation2{} {
>>>
>>>           // Use the information extracted from Annotation3 to 
>>> determine if this particular Annotation2 is the one I want
>>>
>>>       }
>>>
>>> }
>>>
>>>    
>>>
>>> And I actually want the custom BLOCK extension to have the right 
>>> context when within the BLOCK. So I want the DOCUMENTBLOCK extension 
>>> to look for
>>> Annotation2 in the whole document, but once you are inside the 
>>> DOCUMENTBLOCK
>>> Annotation2 should be the new scope (as the current BLOCK statement 
>>> does right now).
>>>
>>>    
>>>
>>> So initially this is the code I tried:
>>>
>>>    
>>>
>>>      public ScriptApply apply(RutaStream stream, InferenceCrowd crowd) {
>>>        // Create a new stream of the whole document
>>>        RutaStream docStream =
>>> stream.getWindowStream(stream.getCas().getDocumentAnnotation(),
>>> stream.getCas().getDocumentAnnotation().getType());
>>>        BlockApply result = new BlockApply(this);
>>>        crowd.beginVisit(this, result);
>>>        RuleApply apply = rule.apply(docStream, crowd, true);
>>>        for (AbstractRuleMatch<? extends AbstractRule> eachMatch :
>>> apply.getList()) {
>>>          if (eachMatch.matched()) {
>>>            List<AnnotationFS> matchedAnnotations = ((RuleMatch) 
>>> eachMatch).getMatchedAnnotations(null, null);
>>>            if (matchedAnnotations == null || 
>>> matchedAnnotations.isEmpty())
> {
>>>              continue;
>>>            }
>>>            AnnotationFS each = matchedAnnotations.get(0);
>>>            if (each == null) {
>>>              continue;
>>>            }
>>>    
>>>            List<Type> types = ((RutaRuleElement)
>>> rule.getRuleElements().get(0)).getMatcher().getTypes(getParent() == 
>>> null
> ?
>>> this : getParent(), docStream);
>>>            for (Type eachType : types) {
>>>              RutaStream window = docStream.getWindowStream(each,
> eachType);
>>>              for (RutaStatement element : getElements()) {
>>>                if (element != null) {
>>>                  element.apply(window, crowd);
>>>                 }
>>>               }
>>>              }
>>>            }
>>>          }
>>>        crowd.endVisit(this, result);
>>>        return result;
>>>      }
>>>
>>>    
>>>
>>> I thought I would just get a new stream that covers the whole 
>>> document, and apply the rules to that but the call “apply.getList()”
>>> would never return anything even though I don’t have any conditions 
>>> in the RUTA script for the DOCUMENTBLOCK extension. And that is why 
>>> I ended up calling the method getAllOfType, because that one was 
>>> working fine, but of course, it doesn’t apply the conditions.
>>>
>>>    
>>>
>>> Any ideas why the “getList” wouldn’t return anything even though I 
>>> am passing a new stream that covers the whole document?
>>>
>>>    
>>>
>>> If I get this to work, I have no problems contributing it to the 
>>> UIMA RUTA project.
>>>
>>>    
>>>
>>> Cheers,
>>>
>>> Miguel
>>>
>>>    
>>>
>>>    
>>>
>>> From: Peter Klügl <
>>> <http://gmane.org/get-address.php?address=peter.kluegl%2deqSzvFVgjyd
>>> BD
>>> gjK7y7 TUQ%40public.gmane.org> peter.kluegl@...>
>>> Subject:
>>> <http://news.gmane.org/find-root.php?message_id=566D5BCE.7070503%40a
>>> ve
>>> rbis.c
>>> om> Re: UIMA RUTA - Custom BLOCK extension
>>> Newsgroups:  <http://news.gmane.org/gmane.comp.apache.uima.devel>
>>> gmane.comp.apache.uima.devel
>>> Date: 2015-12-13 11:51:42 GMT (14 hours and 50 minutes ago)
>>>
>>> Hi,
>>>    
>>> oh yes, this is a nice extension. I was also already planning to add 
>>> something like this, but in my use cases the explicit referencing to 
>>> each matched annotation in the gobal context was missing. Thus, I am 
>>> implementing the annotation issues first.
>>>    
>>> It is possible to specify something like this right now in UIMA Ruta 
>>> but I would not recommend it. You could either spam/remove 
>>> annotations on the complete document or you could use the recursion 
>>> functionality of BLOCKs.
>>>    
>>> Now to the custom block:
>>>    
>>> You need to apply the head rule of the block in order to evaluate 
>>> the conditions. The scope is changed by the usage of a new 
>>> restricted RutaStream (windowStream). In order to retain the scope, 
>>> just use the given RutaStream.
>>>    
>>> Without having tested it, it could look something like:
>>>    
>>>    <at> Override
>>>      public ScriptApply apply(RutaStream stream, InferenceCrowd crowd) {
>>>        BlockApply result = new BlockApply(this);
>>>        crowd.beginVisit(this, result);
>>>        RuleApply apply = rule.apply(stream, crowd, true);
>>>        for (AbstractRuleMatch<? extends AbstractRule> eachMatch :
>>> apply.getList()) {
>>>          if (eachMatch.matched()) {
>>>              for (RutaStatement element : getElements()) {
>>>                if (element != null) {
>>>                  element.apply(stream, crowd);
>>>              }
>>>            }
>>>          }
>>>        }
>>>        crowd.endVisit(this, result);
>>>        return result;
>>>      }
>>>    
>>> Let me know if this helps.
>>>    
>>> Do you want to contribute the block extension?
>>>    
>>> Best,
>>>    
>>> Peter
>>>    
>>> Am 12.12.2015 um 00:04 schrieb Miguel Alvarez:
>>>> Hi,
>>>>
>>>>     
>>>>
>>>> I am in the process of developing a custom BLOCK extension that 
>>>> instead of changing the scope of the block, it uses the scope of 
>>>> the
>> whole Document.
>>>> With this type of BLOCK one could loop through a series of 
>>>> annotations,
>>> and
>>>> for each of those annotations search in the whole document for 
>>>> something else. I guess my first questions is: Is it even possible 
>>>> to do something like this without creating a custom BLOCK extension?
>>>>
>>>>     
>>>>
>>>> I got something to work, but it doesn't seem to apply the 
>>>> conditions for
>>> the
>>>> block. This is more or less the code I have so far:
>>>>
>>>>     
>>>>
>>>>                  List<Type> types = ((RutaRuleElement)
>>>> rule.getRuleElements().get(0)).getMatcher().getTypes(getParent() == 
>>>> null
>> ?
>>>> this : getParent(), stream);
>>>>
>>>>                  for (Type eachType : types) {
>>>>
>>>>                         //System.out.println("each Type: " + 
>>>> eachType.getShortName());
>>>>
>>>>                         for(AnnotationFS each :
>>> stream.getAllofType(eachType))
>>>> {
>>>>
>>>>                       RutaStream window = 
>>>> stream.getWindowStream(each, eachType);
>>>>
>>>>                       for (RutaStatement element : getElements()) {
>>>>
>>>>                         if (element != null) {
>>>>
>>>>                           element.apply(window, crowd);
>>>>
>>>>                         }
>>>>
>>>>                       }
>>>>
>>>>                               
>>>>
>>>>                         }
>>>>
>>>>                  }
>>>>
>>>>     
>>>>
>>>> I assume in order to apply the conditions I would need something 
>>>> like
>>> this:
>>>>                  RuleApply apply = rule.apply(stream, crowd);
>>>>
>>>>     
>>>>
>>>> But for some reason this doesn't work, because I guess the scope 
>>>> has
>>> already
>>>> been changed and it is not able to find any of the annotations in 
>>>> within
>>> the
>>>> scope.
>>>>
>>>>     
>>>>
>>>> Does this make any sense? Is there a better way to do this?
>>>>
>>>>     
>>>>
>>>> Any help would be much appreciated.
>>>>
>>>>     
>>>>
>>>> Cheers,
>>>>
>>>> Miguel
>>>>
>>>>
>>>    
>>>
>>>
>



Mime
View raw message