tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Eric Pugh (Jira)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2969) Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on path
Date Sun, 20 Oct 2019 13:47:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955498#comment-16955498

David Eric Pugh commented on TIKA-2969:

I noticed that when I run `mvn test` the output is:   

Tesseract executable isn't on the path, so skipping tests.  If Tesseract is installed in a
custom location, please update TesseractOCRConfig.properties in src/test/resources.
[WARNING] Tests run: 13, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 3.447 s - in org.apache.tika.parser.ocr.TesseractOCRParserTest

However, due to the use of canRun(), a test like testPDFOCR() appears to have completed, as
it doesn't use the assumeTrue() concept.   Is the best fix the warning message, and change
over to assumeTrue() everywhere?   I'd love some input.   https://github.com/apache/tika/pull/290

> Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on
> -------------------------------------------------------------------------------------------
>                 Key: TIKA-2969
>                 URL: https://issues.apache.org/jira/browse/TIKA-2969
>             Project: Tika
>          Issue Type: Improvement
>          Components: ocr
>    Affects Versions: 1.22
>            Reporter: David Eric Pugh
>            Priority: Minor
> Tesseract isn't installed on my path by default, I have to set the tesseractPath and
tessdataPath properties.   In trying to sort things out I ran the TesseractOCRParserTest and
was shocked that it worked..   It wasn't till i dug in more that I realized that the unit
tests check with the canRun() method, and then either don't run, but with no feedback to the
user, or there is the assumeTrue() assert, which just stops the unit tests.   
> This issue is to make this test communicate better for the next person!

This message was sent by Atlassian Jira

View raw message