lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <>
Subject Re: Problem with cyrillics letters through Tika OCR indexing
Date Fri, 10 Feb 2017 12:55:35 GMT
At what level is this exactly a problem? Are you looking for a way for Solr
to pass -L rus flag to Tika?

Or you are saying that whatever OCR is used here is bad. In the second
case, this is probably not a question for Solr or even Tika but for
whatever underlying OCR library is.

The stack is deep here, more precision is required.


On 10 Feb 2017 2:52 AM, "Абрашин, Игорь Олегович" <>

Hello, everyone I’m encountered the error mentioned at the title?

The original image attached and recognized text below:
3ApaBCTyI7ITe 9| )KVIBy xopomo

Does anyone faced the similar?
Need to mentioned that tesseract recognize it more correctly with –l rus

Thanks in advance!

*С уважением, *

*Игорь Абрашин*


*тел. раб.: +7 (3452) 680-386 <+7%20345%20268-03-86>*

*тел. внутр. корпор.: 22-586*

[image: 121]

  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message