tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kranthi Kiran G V <kkran...@student.nitw.ac.in>
Subject Improving Tika OCR
Date Mon, 17 Apr 2017 11:46:23 GMT
Hello Tim Allison,

I am currently working on improving Tika's OCR capabilities.
After suggestion from Thamme Gowda (@thammegowda
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=thammegowda>),
I started to work on comparison of Tesseract 4.0's neural network
<https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00>
subsystem and Visual Geometry Group's (VGG) models
<http://www.robots.ox.ac.uk/~vgg/research/text/>.

It would be great if you provide the dataset to test the OCR as you
mentioned in one of the issues.

I would be comparing their running time for evaluation, accuracy, memory
consumed and invariance to lighting, orientation, etc. And then I would be
integrating the appropriate models into Tika's OCR.

Thank you,
Kranthi Kiran GV,
CS 3/4 Undergrad,
NIT Warangal

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message