tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Wolfe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1703) Can't Specify Tesseract Data Folder Distinct from Tesseract Executable Path
Date Tue, 04 Aug 2015 01:45:04 GMT
Christian Wolfe created TIKA-1703:
-------------------------------------

             Summary: Can't Specify Tesseract Data Folder Distinct from Tesseract Executable
Path
                 Key: TIKA-1703
                 URL: https://issues.apache.org/jira/browse/TIKA-1703
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.9
            Reporter: Christian Wolfe
            Priority: Minor
             Fix For: 1.9


If a user specifies the path to the Tesseract executable using {{TesseractOCRConfig.setTesseractPath}},
then Tika will assume that the Tesseract config folder (usually referred to as the 'tessdata'
folder) is in the same location. This is usually true in a Windows environment, where everything
is installed into a central location.

However, this is not necessarily the case in a Linux environment. If one were to build Tesseract
from source, for example, the config folder will be installed in a different location than
the Tesseract executable.

One way to fix this would be to add a way to specify the location of the Tesseract config
folder separate from the path to the executable.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message