tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Palsulich <tpalsul...@gmail.com>
Subject Re: Tesseract OCR always activeated parser for images
Date Mon, 06 Oct 2014 23:49:30 GMT
Confirmed. This is why we ran into TIKA-1422. But, Chris' patch may provide
the backwards compatibility you're looking for. What do you think?


On Mon, Oct 6, 2014 at 7:47 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Folks,
> Now, once I install Tesseract, it is run for every image I pass through
> Tika server or Tika app.
> This is not okay as it does not give me the type of MD I am looking for.
> This is a just a note to folks, to say that AFAIK you would need to
> unregister the the parser from [0] then rebuild from source in order to
> maintain backwards compatability in this regard.
> Before I log a ticket for this, can anyone else confirm this please?
> Thanks
> Lewis
> [0]
> https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
> --
> *Lewis*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message