tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kevin slote <kslo...@gmail.com>
Subject Re: [jira] [Commented] (TIKA-93) OCR support
Date Thu, 21 Aug 2014 21:09:58 GMT
Is Tesseract in the trunk?  If so where can I find it?  Also, Petr, would
you mind posting your tika-config.xml?


On Wed, Aug 20, 2014 at 3:36 AM, Petr Vas (JIRA) <jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103563#comment-14103563
> ]
>
> Petr Vas commented on TIKA-93:
> ------------------------------
>
> No problem )
>
> > OCR support
> > -----------
> >
> >                 Key: TIKA-93
> >                 URL: https://issues.apache.org/jira/browse/TIKA-93
> >             Project: Tika
> >          Issue Type: New Feature
> >          Components: parser
> >            Reporter: Jukka Zitting
> >            Assignee: Chris A. Mattmann
> >            Priority: Minor
> >             Fix For: 1.7
> >
> >         Attachments: TIKA-93.patch, TIKA-93.patch, TIKA-93.patch,
> TIKA-93.patch, TesseractOCRParser.patch, TesseractOCRParser.patch,
> TesseractOCR_Tyler.patch, TesseractOCR_Tyler_v2.patch, testOCR.docx,
> testOCR.pdf, testOCR.pptx
> >
> >
> > I don't know of any decent open source pure Java OCR libraries, but
> there are command line OCR tools like Tesseract (
> http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to
> extract text content (where available) from image files.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message