tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2917) Extract metadata from inline images in PDFs
Date Wed, 31 Jul 2019 20:17:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897492#comment-16897492
] 

Hudson commented on TIKA-2917:
------------------------------

SUCCESS: Integrated in Jenkins build tika-branch-1x #228 (See [https://builds.apache.org/job/tika-branch-1x/228/])
TIKA-2917 -- extract metadata that accompanies inline images (tallison: [https://github.com/apache/tika/commit/fd0eeb93a254de9320d04775f492287a716f5e92])
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* (add) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDMetadataExtractor.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java


> Extract metadata from inline images in PDFs
> -------------------------------------------
>
>                 Key: TIKA-2917
>                 URL: https://issues.apache.org/jira/browse/TIKA-2917
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Minor
>
> Inline images may have XMP associated with them.  We are not currently extracting this
metadata.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message