tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Oberlag (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2290) PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS
Date Mon, 06 Mar 2017 22:13:32 GMT
Kevin Oberlag created TIKA-2290:
-----------------------------------

             Summary: PDFParser 'ocr' properties cannot be set via headers when using Tika
JAXRS
                 Key: TIKA-2290
                 URL: https://issues.apache.org/jira/browse/TIKA-2290
             Project: Tika
          Issue Type: Bug
          Components: ocr, parser
    Affects Versions: 1.14, 1.13
            Reporter: Kevin Oberlag


I have created a stackoverflow question on this topic [here | http://stackoverflow.com/questions/42602834/x-tika-pdfocrstrategy-is-an-invalid-x-tika-ocr-header-error],
but I'll reiterate the main issue. 

I am trying to use TikaJAXRS and add headers for setting PDFParser properties. Specifically
the ocrStrategy property. However, when I add the header using X-Tika-PDFocrStrategy, I get
an error stating that it is an invalid X-Tika-OCR header.

After looking into the source code, I believe the issue might be with the 'fillParseContext'
method in the TikaResource.java file.

The if statement first looks for a key that starts with the OCR header prefix, and since the
PDFParser's property name contains 'ocr', it is trying to find a property named 'ocrStrategy'
in the OCRParser class, which doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message