tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2374) Tika App -z should extract PDF inline images by default
Date Mon, 22 May 2017 16:53:04 GMT
Nick Burch created TIKA-2374:

             Summary: Tika App -z should extract PDF inline images by default
                 Key: TIKA-2374
                 URL: https://issues.apache.org/jira/browse/TIKA-2374
             Project: Tika
          Issue Type: Improvement
          Components: cli
    Affects Versions: 1.14
            Reporter: Nick Burch

As discussed on dev@ - If you use the Tika App with the default config and the {{-z}} extract
option, it will extract embedded resources, except PDF inline images. This is unexpected for
new users, who won't know that they'd need to pass in a custom config with the {{extractInlineImages}}
PDF parser option set

If the user passes in an explicit config to the app, we should respect that. However, if they
don't pass one in and take the default, the -z option should (but only that one) enable whatever
options are needed to make extraction work properly + fully (currently just {{extractInlineImages}})

If possible/easy, the -z option should print out some info to let affected users know that
the default config was tweaked to give extra embedded resources

This message was sent by Atlassian JIRA

View raw message