[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168926#comment-14168926 ] Chris A. Mattmann commented on TIKA-93: --------------------------------------- Hi [~twigbranch] thanks for the comment, please see: https://wiki.apache.org/tika/TikaOCR Are you sure you have installed Tesseract with TIFF support? The wiki page above has some instructions for Mac and brew to get Tesseract with libTIFF. > OCR support > ----------- > > Key: TIKA-93 > URL: https://issues.apache.org/jira/browse/TIKA-93 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Jukka Zitting > Assignee: Chris A. Mattmann > Priority: Minor > Fix For: 1.7 > > Attachments: Petr_tika-config.xml, TIKA-93.patch, TIKA-93.patch, TIKA-93.patch, TIKA-93.patch, TesseractOCRParser.patch, TesseractOCRParser.patch, TesseractOCR_Tyler.patch, TesseractOCR_Tyler_v2.patch, TesseractOCR_Tyler_v3.patch, TesseractOCR_Tyler_v4.patch, testOCR.docx, testOCR.pdf, testOCR.pptx > > > I don't know of any decent open source pure Java OCR libraries, but there are command line OCR tools like Tesseract (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to extract text content (where available) from image files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)