uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Assigned] (UIMA-5147) RUTA leaves the contents of STYLE tags in plaintext
Date Wed, 19 Oct 2016 12:23:58 GMT

     [ https://issues.apache.org/jira/browse/UIMA-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Peter Klügl reassigned UIMA-5147:

    Assignee: Peter Klügl

> RUTA leaves the contents of STYLE tags in plaintext
> ---------------------------------------------------
>                 Key: UIMA-5147
>                 URL: https://issues.apache.org/jira/browse/UIMA-5147
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.3.0ruta
>            Reporter: Dale Lane
>            Assignee: Peter Klügl
>            Priority: Minor
>             Fix For: 2.5.1ruta
> I'm using RUTA HtmlAnnotator and HtmlConverter to turn an HTML document into the plain
text extracted from it, with annotations to represent the markup that were in the original
> The contents of <STYLE> tags are showing up in the plaintext view, which isn't
helpful. As STYLE isn't part of the document contents, I think it'd be better for this not
to be added to plaintext, or at least for there to be an option to allow this to be excluded.

> (Apologies if I've missed a way to do this using the existing options)
> As an example of a simple recreate, a document like this can be used:
> {code:xml}
> <html><head>
>     <style>
>         /*  */
>         .test {
>             text-align: left;
>         }
>     </style>
> </head><body>Hello world</body></html>
> {code}

This message was sent by Atlassian JIRA

View raw message