tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Palsulich (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (TIKA-1543) TesseractOCRParser.setTesseractPath() doesn't work on Linux
Date Sun, 22 Mar 2015 21:09:11 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tyler Palsulich closed TIKA-1543.
---------------------------------
    Resolution: Fixed

This isn't actually a problem. I just tested locally -- it works.

We have unit tests for the path, but it's difficult to test that extraction works with a non-standard
path, since we don't know what the path is...

I think the problem is either:
    The path you set is not to the directory that contains the executable or 
    The path doesn't have a tessdata directory inside it.

You can see all of the Tesseract debugging messages by enabling {{debug}} level logging (put
a [log4j.properties|https://github.com/apache/tika/blob/10298692cb27d1ad3732589930987e2fe2681ee8/tika-parsers/src/test/resources/log4j.properties]
file on your classpath and set the output level to {{debug}}).

I'd be happy to help you debug further.

> TesseractOCRParser.setTesseractPath() doesn't work on Linux
> -----------------------------------------------------------
>
>                 Key: TIKA-1543
>                 URL: https://issues.apache.org/jira/browse/TIKA-1543
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.7
>            Reporter: Sean Zhao
>             Fix For: 1.8
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> After call setTesseractPath() to set the Tesseract path to a not-default path, like /root/tesseract
, call the TesseractOCRParser.parse(), nothing will return.
> Not sure if this is related to TIKA-1421.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message