[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1445:
------------------------------
Attachment: TIKA-1445_tallison_v3_20141027.patch
This version subclasses Parser to create an ImageMetaParser class, which our current image
metadata parsers then extend.
This adds a DefaultImageMetadataparser that is a copy and paste of DefaultParser...can't override
static loader unfortunately!
We now specify regular parsers in the Parser services file and ImageMetadataParsers in a separate
services file.
I don't like that this creates a new "class" of parsers, but I can't think of another way
of guaranteeing that the OCRParser will find an image metadata parser correctly.
> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
> Key: TIKA-1445
> URL: https://issues.apache.org/jira/browse/TIKA-1445
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.8
>
> Attachments: TIKA-1445.Mattmann.101214.patch.txt, TIKA-1445.Palsulich.102614.patch,
TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, consider
how to add back in the metadata extraction capabilities by the other Image parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|