tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject Re: Apache Tika
Date Wed, 03 May 2017 13:58:09 GMT
Hi Gorka,


See: http://wiki.apache.org/tika/TikaOCR/


Is that what you’re looking for? If so, then you can simply enable OCR for Tika REST server,
and then
point your TIka Python at that. Does that help?








From: gorka gallo <gorkagal@gmail.com>
Date: Wednesday, May 3, 2017 at 2:19 AM
To: "Mattmann, Chris A (3010)" <chris.a.mattmann@jpl.nasa.gov>
Subject: Apache Tika


Hi Chris, 


I am Gorka Gallo, a research technician from Bilbao, Spain.


Is there any method to extract embedded images in PDF files with Apache Tika using Python?




Best regards,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message