uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2758) TextMarker: Provide support for tree structures and parse trees in rule language
Date Mon, 13 May 2013 09:29:16 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655866#comment-13655866
] 

Peter Klügl commented on UIMA-2758:
-----------------------------------

There is one thing I keep thinking about:
How does the feature match influence the sequential matching? Or better: Is there only one
reasonable interpretation of the sequential matching.

Here's an example to talk about (the test case I am using right now to develop that stuff):

Input text:
{noformat}
Peter Kluegl, Joern Kottmann, Marshall Schor
{noformat}

Rules:
{noformat}
PACKAGE org.apache.uima.ruta;

//A = full name
//B = last name
//C = first name
DECLARE Annotation D(STRING ds);
DECLARE D C(INT ci, BOOLEAN cb);
DECLARE D B(C bc);
DECLARE Annotation A(B ab, C ac);

INT count;
CW{ -> ASSIGN(count, count+1), CREATE(C, "ds" = "firstname", "ci" = count, "cb" = false)}
CW{ -> 
    GATHER(B, "bc" = 1), FILL(B, "ds" = "lastname")};
C{REGEXP("M.*") -> SETFEATURE("cb", true)};
(CW CW){-> CREATE(A, "ab" = B, "ac" = C)};
{noformat}

So, if I write a rule like:

{noformat}
(A.ac.ci==1 # A.ac.ci==2 # A.ac.ci==3);
{noformat}

... then on what should the wildcard (#) match?
Right now, only the annotation, which is actually used in the sequential matching, determines
the possible annotations of the next rule element. Therefore, the wildcard matched on " Kluegl,
", because "ac" is only the first name. One would maybe expect that the rule element matches
on the complete name since the rule element starts with "A", which refers to the complete
name. The rule itself would now create an annotation covering "Peter Kluegl, Joern Kottmann,
Marshall" (missing " Schor"). Is this behavior intelligible/reasonable to others?

Well, I can imagine that there are use cases where not the match of the feature-annotation
is important, but the match of the annotation containing the feature.

I could think of a solution introducing some operator, which enables navigation in the feature
structure for different parts of a rule element, but that seems not really straight forward.

My favorite solution would be a simple extension: Allow deep feature checks as conditions.

(A{A.ac.ci==1} # A{A.ac.ci==2} # A{A.ac.ci==3});

Here, the wildcards would only match on " , ". A.ac.ci==1 could be interpeted as an IS condition
combined with a FEATURE condition.

Are there any opinions about this problem? I should search for some real use cases with parse
trees.

                
> TextMarker: Provide support for tree structures and parse trees in rule language
> --------------------------------------------------------------------------------
>
>                 Key: UIMA-2758
>                 URL: https://issues.apache.org/jira/browse/UIMA-2758
>             Project: UIMA
>          Issue Type: New Feature
>          Components: ruta
>            Reporter: Peter Klügl
>            Assignee: Peter Klügl
>
> Manipulation of features which refer to annotations and matching on simple features is
currently supported, but matching on the complex values of some feature is not. A first step
can be something like (Type Person with feature "title" of type Annotation):
> Person.title;
> This rule matches on all annotations, which are values of features of annotations of
the type Person.
> This new language element can also be used for syntactic sugar when checking primitive
feature values:
> Person.begin=0 (A Person annotation, which starts a offset 0)
> This can only be a first step towards supporting tree structures. Maybe there is no way
around something for explicitly and directly referring to certain annotations (which is not
possible right now, but is done by using the type).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message