tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2550) ToTextHandler includes <style/> element content
Date Mon, 22 Jan 2018 20:15:00 GMT
Tim Allison created TIKA-2550:
---------------------------------

             Summary: ToTextHandler includes <style/> element content
                 Key: TIKA-2550
                 URL: https://issues.apache.org/jira/browse/TIKA-2550
             Project: Tika
          Issue Type: Bug
            Reporter: Tim Allison


When using the ToTextHandler to process .java files, the <style/> element content is
included, e.g.:

{noformat}
testFile
code {
color: rgb(0,0,0); font-family: monospace; font-size: 12px; white-space: nowrap;
}
.java_plain {
color: rgb(0,0,0);
}
.java_keyword {
color: rgb(0,0,0); font-weight: bold;
}
.java_javadoc_tag {
color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic; font-weight:
bold;
}
h1 {
font-family: sans-serif; font-size: 16pt; font-weight: bold; color: rgb(0,0,0); background:
rgb(210,210,210); border: solid 1px black; padding: 5px; text-align: center;
}
.java_type {
color: rgb(0,44,221);
}
.java_literal {
color: rgb(188,0,0);
}
.java_javadoc_comment {
color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic;
}
.java_operator {
color: rgb(0,124,31);
}
.java_separator {
color: rgb(0,33,255);
}
.java_comment {
color: rgb(147,147,147); background-color: rgb(247,247,247);
}

testFile/*************************************************************************
 *  Compilation:  javac HelloWorld.java
 *  Execution:    java HelloWorld
 *
 *  Prints "Hello, World". By tradition, this is everyone's first program.
 *
 *************************************************************************/

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World");
    }

}

{noformat}
Is this what we want as the default behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message