tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser
Date Wed, 29 Oct 2014 15:19:33 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tim Allison updated TIKA-1445:
    Attachment: TIKA-1445_tallison_v3_20141027.patch

This version subclasses Parser to create an ImageMetaParser class, which our current image
metadata parsers then extend.

This adds a DefaultImageMetadataparser that is a copy and paste of DefaultParser...can't override
static loader unfortunately!

We now specify regular parsers in the Parser services file and ImageMetadataParsers in a separate
services file.

I don't like that this creates a new "class" of parsers, but I can't think of another way
of guaranteeing that the OCRParser will find an image metadata parser correctly.

> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>         Attachments: TIKA-1445.Mattmann.101214.patch.txt, TIKA-1445.Palsulich.102614.patch,
TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch
> Now that Tesseract is the default image parser in Tika for many image types, consider
how to add back in the metadata extraction capabilities by the other Image parsers.

This message was sent by Atlassian JIRA

View raw message