tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damiano (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1396) Embedded images in PDF documents
Date Thu, 14 Aug 2014 14:46:12 GMT
Damiano created TIKA-1396:

             Summary: Embedded images in PDF documents
                 Key: TIKA-1396
                 URL: https://issues.apache.org/jira/browse/TIKA-1396
             Project: Tika
          Issue Type: Bug
          Components: cli
    Affects Versions: 1.5
         Environment: OS: 
Ubuntu 14.04.1 LTS

gcc version 4.8.2

java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)

            Reporter: Damiano
            Priority: Critical

I just found a problem with PDF documents that have embedded images.


java -jar tika-app-1.5.jar --extract tika.pdf

Tika can not find the image.

Is this a PDF related problem? Because if i do the same operation with a DOC document Tika
finds the image correctly.

This message was sent by Atlassian JIRA

View raw message