tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2550) ToTextHandler includes <style/> element content
Date Mon, 03 Dec 2018 16:09:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707435#comment-16707435
] 

Hudson commented on TIKA-2550:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1603 (See [https://builds.apache.org/job/Tika-trunk/1603/])
TIKA-2550 -- make sure that ToTextHandler's new behavior of ignoring (tallison: [https://github.com/apache/tika/commit/a178f61b1ba63f2e81d4c0fb6244a03de95f6399])
* (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java


> ToTextHandler includes <style/> element content
> -----------------------------------------------
>
>                 Key: TIKA-2550
>                 URL: https://issues.apache.org/jira/browse/TIKA-2550
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Trivial
>             Fix For: 2.0.0, 1.20
>
>
> When using the ToTextHandler to process .java files, the <style/> element content
is included, e.g.:
> {noformat}
> testFile
> code {
> color: rgb(0,0,0); font-family: monospace; font-size: 12px; white-space: nowrap;
> }
> .java_plain {
> color: rgb(0,0,0);
> }
> .java_keyword {
> color: rgb(0,0,0); font-weight: bold;
> }
> .java_javadoc_tag {
> color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic; font-weight:
bold;
> }
> h1 {
> font-family: sans-serif; font-size: 16pt; font-weight: bold; color: rgb(0,0,0); background:
rgb(210,210,210); border: solid 1px black; padding: 5px; text-align: center;
> }
> .java_type {
> color: rgb(0,44,221);
> }
> .java_literal {
> color: rgb(188,0,0);
> }
> .java_javadoc_comment {
> color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic;
> }
> .java_operator {
> color: rgb(0,124,31);
> }
> .java_separator {
> color: rgb(0,33,255);
> }
> .java_comment {
> color: rgb(147,147,147); background-color: rgb(247,247,247);
> }
> testFile/*************************************************************************
>  *  Compilation:  javac HelloWorld.java
>  *  Execution:    java HelloWorld
>  *
>  *  Prints "Hello, World". By tradition, this is everyone's first program.
>  *
>  *************************************************************************/
> public class HelloWorld {
>     public static void main(String[] args) {
>         System.out.println("Hello, World");
>     }
> }
> {noformat}
> Is this what we want as the default behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message