tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejan Wijesinghe <thejan.k.wijesin...@gmail.com>
Subject Re: Tess4j API for TIKA OCR parser
Date Tue, 07 Mar 2017 12:58:15 GMT
Hi Nick,

I thought the same thing. I will try to keep the public method signatures
unchanged and will send updates on my progress.

On Tue, Mar 7, 2017 at 5:48 PM, Nick Burch <apache@gagravarr.org> wrote:

> On Tue, 7 Mar 2017, Thejan Wijesinghe wrote:
>> I have already use the Tess4j API to rewrite the TesseractOCRParser class,
>> Although It successfully extracts content from most of the file types, it
>> fails some particular unit tests in the TesseractOCRParserTest class. I
>> can
>> solve that. However, I want to know whether I can rewrite the entire
>> TesseractOCRParser class from the ground up, but if I do that there will
>> be
>> many broken links in the internals of TIKA because as I witnessed, most of
>> the classes use TesseractOCRParser class indirectly.
> If you can, try to keep the public methods unchanged. That way, other
> callers to the class will be unaffected by your re-write of the internal
> logic
> Nick

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message