tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly
Date Wed, 09 Nov 2016 17:38:58 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651535#comment-15651535

Nick Burch commented on TIKA-2159:

Given that we don't control all the parsers, I'm worried things my break oddly and unexpectedly
for some users if we go for #2. That said, if we through a form of IOException with the details
the moment the parser tried to do anything to the input stream, it might not cause too many

{{ParsingEmbeddedDocumentExtractor}} already has some non-ideal error handling bits, so writing
some special keys onto the container might allow us to tidy some bits of that up too if we
do #1

> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>                 Key: TIKA-2159
>                 URL: https://issues.apache.org/jira/browse/TIKA-2159
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
> When an embedded document is parsed and causes an exception, we're currently catching
that and swallowing it in ParsingEmbeddedDocumentExtractor (the default) or reporting it in
the RecursiveParserWrapper by storing the stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or on getting
the stream _before_ the stream hits the parser, we aren't handling that uniformly or robustly
across parsers.

This message was sent by Atlassian JIRA

View raw message