uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miguel Alvarez (JIRA)" <...@uima.apache.org>
Subject [jira] [Created] (UIMA-5757) Unable to extract features when annotation ends with HTML tag
Date Thu, 05 Apr 2018 23:02:00 GMT
Miguel Alvarez created UIMA-5757:
------------------------------------

             Summary: Unable to extract features when annotation ends with HTML tag
                 Key: UIMA-5757
                 URL: https://issues.apache.org/jira/browse/UIMA-5757
             Project: UIMA
          Issue Type: Bug
          Components: Ruta
    Affects Versions: 2.6.1ruta
         Environment: RUTA 2.6.1, Windows 10, Eclipse Mars, JDK 1.8.0_144
            Reporter: Miguel Alvarez


If there is an annotation that covers the whole sofa string, and the sofa string ends with
an HTML tag, it seems like RUTA isn't able to extract the features for that annotation. For
instance, lets suppose this document (represented as XMI):

 
{code:java}
// XMI document
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:xmi="http://www.omg.org/XMI" xmlns:cas="http:///uima/cas.ecore" xmlns:tcas="http:///uima/tcas.ecore"
xmlns:types="http:///com/acme/uima/types.ecore" xmi:version="2.0">
<cas:NULL xmi:id="0"/>
<tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="12" language="es"/>
<types:MyDocument xmi:id="14" sofa="1" begin="0" end="12" documentId="test_docsize_39d5541c-5e7f-391c-95af-c82ce6306644"/>
<cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="ABCDEFGHIJ&lt;p&gt;"/>
<cas:View sofa="1" members="8 14"/>
</xmi:XMI>
{code}
And the following RUTA script:

 

 
{code:java}
// RUTA script
STRING documentId = "Unknown";
com.acme.uima.types.MyDocument{-> GETFEATURE("documentId", documentId)};
LOG("Starting to process document: " + documentId);
{code}
The LOG action will output Unknown. But as soon as the string doesn't end with an HTML tag,
it works fine.

 

Any ideas what could be going on?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message