uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] [Created] (UIMA-4115) TikaAnnotator: incorrect order of tags processing
Date Fri, 21 Nov 2014 16:51:12 GMT
Tommaso - could you take a look?

-Marshall

On 11/20/2014 3:09 PM, Vadym Oliinyk (JIRA) wrote:
> Vadym Oliinyk created UIMA-4115:
> -----------------------------------
>
>              Summary: TikaAnnotator: incorrect order of tags processing
>                  Key: UIMA-4115
>                  URL: https://issues.apache.org/jira/browse/UIMA-4115
>              Project: UIMA
>           Issue Type: Bug
>           Components: addons
>     Affects Versions: 2.3.1Addons
>             Reporter: Vadym Oliinyk
>
>
> org.apache.uima.tika.MarkupAnnotator outputs incorrect content due to bug in org.apache.uima.tika.MarkupHandler.
The problem located in the end element event handler: MarkupHandler#endElement method should
close opened tags by removing them from the stack (last added tag should be removed first
if corresponding end tag found). But in current implementation it removes start elements beginning
from the first open element which results in incorrect text spans annotated by the processor.
>
> The fix is trivial:
> in MarkupHandler#endElement replace startedAnnotations.iterator() with 
> startedAnnotations.descendingIterator().
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
>


Mime
View raw message