tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Palsulich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser
Date Mon, 27 Oct 2014 19:19:33 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185644#comment-14185644

Tyler Palsulich commented on TIKA-1445:

bq. Doh! Send in a DefaultHandler instead of BodyContentHandler to the "otherParser"
I made the same mistake.

I think our ideas are very similar. But, I offloaded the dynamic loading to {{DefaultParser.getAllParsersFor}},
since it already has service loading. But, my logic for getting the underlying DefaultParser
from the AutoDetectParser is somewhat hacky. +1 to the expanded tests and always parsing with
the otherParser, though!

> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>         Attachments: TIKA-1445.Mattmann.101214.patch.txt, TIKA-1445.Palsulich.102614.patch,
> Now that Tesseract is the default image parser in Tika for many image types, consider
how to add back in the metadata extraction capabilities by the other Image parsers.

This message was sent by Atlassian JIRA

View raw message