uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mario Juric (Jira)" <...@uima.apache.org>
Subject [jira] [Created] (UIMA-6137) Type-based filtering in Ruta rules
Date Wed, 23 Oct 2019 10:58:00 GMT
Mario Juric created UIMA-6137:

             Summary: Type-based filtering in Ruta rules
                 Key: UIMA-6137
                 URL: https://issues.apache.org/jira/browse/UIMA-6137
             Project: UIMA
          Issue Type: New Feature
          Components: Ruta
            Reporter: Mario Juric

The visibility concept in Ruta is not type-based but type coverage-based, which means that
filtered types will hide the are they cover to the Ruta rules, i.e. these areas become invisible
to the rules.

We have a use case where we only want to hide the types from being considered in the rules,
and not the covered text area where other types found in these areas should still be considered
by the rules.

We use Ruta as part of the normalization process where we have different text areas marked
with annotations associated with the tags in the original content (title, abstract/summary,
body, COI, authors, citations etc.), and Ruta is part of the parsing process that produces
this view. Using only the content annotations Ruta is then used to markup what areas to include
in a new view for doing NLP. This approach gives us maximum traceability of the normalization

However, the different types of content annotations can sometimes interfere with the rules
beyond our control, and our current solution leads to more awkward rules that are hard to
verify, and which also leads to a less performant implementation. The problem would in our
case better be solved if we were able to tell Ruta simply to ignore certain types from being
considered, i.e. they are invisible to the Ruta rules. Preferably we want to be able to add
and remove filtered types in the script similar to how it works with the coverage based type
Please see also this mailing list thread where a toy example of the problem is discussed:

This message was sent by Atlassian Jira

View raw message