uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vadym Oliinyk (JIRA)" <...@uima.apache.org>
Subject [jira] [Created] (UIMA-4115) TikaAnnotator: incorrect order of tags processing
Date Thu, 20 Nov 2014 20:09:34 GMT
Vadym Oliinyk created UIMA-4115:
-----------------------------------

             Summary: TikaAnnotator: incorrect order of tags processing
                 Key: UIMA-4115
                 URL: https://issues.apache.org/jira/browse/UIMA-4115
             Project: UIMA
          Issue Type: Bug
          Components: addons
    Affects Versions: 2.3.1Addons
            Reporter: Vadym Oliinyk


org.apache.uima.tika.MarkupAnnotator outputs incorrect content due to bug in org.apache.uima.tika.MarkupHandler.
The problem located in the end element event handler: MarkupHandler#endElement method should
close opened tags by removing them from the stack (last added tag should be removed first
if corresponding end tag found). But in current implementation it removes start elements beginning
from the first open element which results in incorrect text spans annotated by the processor.

The fix is trivial:
in MarkupHandler#endElement replace startedAnnotations.iterator() with 
startedAnnotations.descendingIterator().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message