tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: Tess4j API for TIKA OCR parser
Date Tue, 07 Mar 2017 12:18:01 GMT
On Tue, 7 Mar 2017, Thejan Wijesinghe wrote:
> I have already use the Tess4j API to rewrite the TesseractOCRParser class,
> Although It successfully extracts content from most of the file types, it
> fails some particular unit tests in the TesseractOCRParserTest class. I can
> solve that. However, I want to know whether I can rewrite the entire
> TesseractOCRParser class from the ground up, but if I do that there will be
> many broken links in the internals of TIKA because as I witnessed, most of
> the classes use TesseractOCRParser class indirectly.

If you can, try to keep the public methods unchanged. That way, other 
callers to the class will be unaffected by your re-write of the internal 
logic

Nick

Mime
View raw message