tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2749) OCR on PDFs should "just work" out of the box
Date Thu, 04 Oct 2018 12:41:04 GMT
Tim Allison created TIKA-2749:

             Summary: OCR on PDFs should "just work" out of the box
                 Key: TIKA-2749
                 URL: https://issues.apache.org/jira/browse/TIKA-2749
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison

There are now two different ways (with various parameters) to trigger OCR on inline images
within PDFs.  The user has to 1) understand that these are available and then 2) elect to
turn one of those on.

I think we should make OCR'ing on PDFs "just work" perhaps with a hybrid strategy between
the 2 options.  Users should still be allowed to configure as they wish, of course. 

This message was sent by Atlassian JIRA

View raw message