lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Problem with cyrillics letters through Tika OCR indexing
Date Fri, 10 Feb 2017 12:55:35 GMT
At what level is this exactly a problem? Are you looking for a way for Solr
to pass -L rus flag to Tika?

Or you are saying that whatever OCR is used here is bad. In the second
case, this is probably not a question for Solr or even Tika but for
whatever underlying OCR library is.

The stack is deep here, more precision is required.

Удачи,
    Alex

On 10 Feb 2017 2:52 AM, "Абрашин, Игорь Олегович" <Igor.Abrashin@novatek.ru>
wrote:

Hello, everyone I’m encountered the error mentioned at the title?

The original image attached and recognized text below:
3ApaBCTyI7ITe 9| )KVIBy xopomo



Does anyone faced the similar?
Need to mentioned that tesseract recognize it more correctly with –l rus
option.

Thanks in advance!





*С уважением, *

*Игорь Абрашин*

*ООО «НОВАТЭК НТЦ»*

*тел. раб.: +7 (3452) 680-386 <+7%20345%20268-03-86>*

*тел. внутр. корпор.: 22-586*

[image: 121]

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message