tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Palsulich" <tpalsul...@gmail.com>
Subject Re: Review Request 22402: Tika OCR
Date Mon, 11 Aug 2014 04:26:35 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/
-----------------------------------------------------------

(Updated Aug. 11, 2014, 4:26 a.m.)


Review request for tika and Chris Mattmann.


Bugs: TIKA-93
    https://issues.apache.org/jira/browse/TIKA-93


Repository: tika


Description
-------

Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.


Diffs
-----

  trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION

  trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION

  trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1601508

  trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION

  trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1601508 
  trunk/tika-server/src/test/java/org/apache/tika/server/TikaMimeTypesTest.java 1601508 

Diff: https://reviews.apache.org/r/22402/diff/


Testing
-------

Extracting the text from an embedded image in a DOCX, PPTX, and PDF. 


Thanks,

Tyler Palsulich


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message