tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thamme Gowda <thammego...@apache.org>
Subject Re: Improving Tika OCR
Date Mon, 17 Apr 2017 13:31:48 GMT
Thanks, Kranthi, for volunteering to do this evaluation :-)

Best,
Thamme


--
Thamme Gowda
TG | @thammegowda
~Sent via somebody's IMAP server


On Apr 17, 2017 4:46 AM, "Kranthi Kiran G V" <kkranthi@student.nitw.ac.in>
wrote:

Hello Tim Allison,

I am currently working on improving Tika's OCR capabilities.
After suggestion from Thamme Gowda (@thammegowda
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=thammegowda>),
I started to work on comparison of Tesseract 4.0's neural network
<https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00>
subsystem and Visual Geometry Group's (VGG) models
<http://www.robots.ox.ac.uk/~vgg/research/text/>.

It would be great if you provide the dataset to test the OCR as you
mentioned in one of the issues.

I would be comparing their running time for evaluation, accuracy, memory
consumed and invariance to lighting, orientation, etc. And then I would be
integrating the appropriate models into Tika's OCR.

Thank you,
Kranthi Kiran GV,
CS 3/4 Undergrad,
NIT Warangal

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message