tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Pugh <ep...@opensourceconnections.com>
Subject TesseractOCRParserTest needed extra parameters to run...
Date Tue, 20 Aug 2019 14:46:19 GMT
In order to get the TesseractOCRParserTest to run, having installed Tesseract on OSX using
“brew install tesseract”, I had to be explicit about the paths.

Any thoughts on how we could convey to a user that they might need to tweak the path to run
the unit tests?  I was thinking about adding some sort of messaging, but I don’t know if
that is a pattern that we have in Tika with these external dependencies?

Thoughts?

diff --git a/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
b/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
index 9ebcee068..32db2c442 100644
--- a/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
+++ b/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
@@ -51,6 +51,7 @@ public class TesseractOCRParserTest extends TikaTest {
 
     public static boolean canRun() {
         TesseractOCRConfig config = new TesseractOCRConfig();
+        config.setTesseractPath("/usr/local/bin");
         TesseractOCRParserTest tesseractOCRTest = new TesseractOCRParserTest();
         return tesseractOCRTest.canRun(config);
     }
@@ -164,6 +165,8 @@ public class TesseractOCRParserTest extends TikaTest {
                           BasicContentHandlerFactory.HANDLER_TYPE handlerType,
                           TesseractOCRConfig.OUTPUT_TYPE outputType) throws Exception {
         TesseractOCRConfig config = new TesseractOCRConfig();
+        config.setTesseractPath("/usr/local/bin");
+        config.setTessdataPath("/usr/local/Cellar/tesseract/4.1.0/share/tessdata");
         config.setOutputType(outputType);
         
         Parser parser = new RecursiveParserWrapper(new AutoDetectParser(),
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
<http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
 
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be Company Confidential
unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message