tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2917) Extract metadata from inline images in PDFs
Date Wed, 31 Jul 2019 16:43:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897315#comment-16897315
] 

Hudson commented on TIKA-2917:
------------------------------

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #446 (See [https://builds.apache.org/job/tika-2.x-windows/446/])
TIKA-2917 -- extract metadata that accompanies inline images (tallison: rev 86325105ab206dca88d076dc865fcb17404c4531)
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
* (add) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDMetadataExtractor.java


> Extract metadata from inline images in PDFs
> -------------------------------------------
>
>                 Key: TIKA-2917
>                 URL: https://issues.apache.org/jira/browse/TIKA-2917
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Minor
>
> Inline images may have XMP associated with them.  We are not currently extracting this
metadata.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message