tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly
Date Thu, 10 Nov 2016 03:50:58 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652931#comment-15652931
] 

Hudson commented on TIKA-2159:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1137 (See [https://builds.apache.org/job/Tika-trunk/1137/])
TIKA-2159 -- first step (tallison: rev 47ba703d6e682147fa3e6abb9a9cc756c7fc2760)
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pkg/CompressorParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/mail/MailContentHandler.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/mbox/OutlookPSTParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFEmbObjHandler.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/jdbc/JDBCTableReader.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pkg/RarParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/AbstractPOIFSExtractor.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/TNEFParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/mbox/MboxParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
* (add) tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentUtil.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/jdbc/AbstractDBParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pkg/PackageParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/xml/FictionBookParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java


> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>
>                 Key: TIKA-2159
>                 URL: https://issues.apache.org/jira/browse/TIKA-2159
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>
> When an embedded document is parsed and causes an exception, we're currently catching
that and swallowing it in ParsingEmbeddedDocumentExtractor (the default) or reporting it in
the RecursiveParserWrapper by storing the stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or on getting
the stream _before_ the stream hits the parser, we aren't handling that uniformly or robustly
across parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message