tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-2290) PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS
Date Wed, 08 Mar 2017 02:38:38 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tim Allison resolved TIKA-2290.
       Resolution: Fixed
    Fix Version/s: 1.15

Thank you for opening this.  The problem was that reflection tries to set fields directly
and only processes String, int, double, boolean, but not the OCR_Strategy enum.  As a last
ditch effort, we now try to call the setterX() on a String.

Let us know what else you find.

> PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS
> --------------------------------------------------------------------------
>                 Key: TIKA-2290
>                 URL: https://issues.apache.org/jira/browse/TIKA-2290
>             Project: Tika
>          Issue Type: Bug
>          Components: ocr, parser
>    Affects Versions: 1.13, 1.14
>            Reporter: Kevin Oberlag
>            Assignee: Tim Allison
>             Fix For: 2.0, 1.15
> I have created a stackoverflow question on this topic [here | http://stackoverflow.com/questions/42602834/x-tika-pdfocrstrategy-is-an-invalid-x-tika-ocr-header-error],
but I'll reiterate the main issue. 
> I am trying to use TikaJAXRS and add headers for setting PDFParser properties. Specifically
the ocrStrategy property. However, when I add the header using X-Tika-PDFocrStrategy, I get
an error stating that it is an invalid X-Tika-OCR header.
> After looking into the source code, I believe the issue might be with the 'fillParseContext'
method in the TikaResource.java file.
> The if statement first looks for a key that starts with the OCR header prefix, and since
the PDFParser's property name contains 'ocr', it is trying to find a property named 'ocrStrategy'
in the OCRParser class, which doesn't exist.

This message was sent by Atlassian JIRA

View raw message