tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText
Date Thu, 01 Dec 2016 01:54:58 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710536#comment-15710536
] 

Hudson commented on TIKA-2187:
------------------------------

SUCCESS: Integrated in Jenkins build tika-2.x #180 (See [https://builds.apache.org/job/tika-2.x/180/])
TIKA-2187 -- make "ignore deleted" as the default in the experimental (tallison: rev 3d08da79febc75d1ca0fd3293a5f383983057b00)
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/xwpf/SXWPFExtractorTest.java
* (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xwpf/ml2006/Word2006MLParserTest.java
* (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
* (add) tika-test-resources/src/test/resources/test-documents/testWORD_2006ml.doc
* (edit) CHANGES.txt
* (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
* (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java


> Align default behavior of experimental docx parser with that of doc parser in handling
delText
> ----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2187
>                 URL: https://issues.apache.org/jira/browse/TIKA-2187
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 2.0, 1.15
>
>
> Now that we can ignore delText via the experimental alternate SAXParser for .docx files,
let's make that the default behavior to align with the expected behavior for our .doc parser
(ignore deleted text).
> Let's also add the ability to include deleted text from .doc files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message