tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1948) Catch exceptions per page in PDFParser
Date Mon, 11 Apr 2016 19:44:25 GMT
Tim Allison created TIKA-1948:

             Summary: Catch exceptions per page in PDFParser
                 Key: TIKA-1948
                 URL: https://issues.apache.org/jira/browse/TIKA-1948
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison
            Assignee: Tim Allison
            Priority: Minor

In a discussion with [~tilman] somewhere(???), I think he observed that we weren't doing a
try/catch for each page.  If there's an exception in an early page, it might still be possible
to extract text from later pages in a problematic PDF.

With very minimal modifications we could add a try/catch per page, store the caught exceptions,
and then throw the first caught exception after the parse finishes.

This message was sent by Atlassian JIRA

View raw message