tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Miklos (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-389) Garbled metadata when dealing with encrypted PDF files.
Date Tue, 23 Mar 2010 00:27:27 GMT
Garbled metadata when dealing with encrypted PDF files.
-------------------------------------------------------

                 Key: TIKA-389
                 URL: https://issues.apache.org/jira/browse/TIKA-389
             Project: Tika
          Issue Type: Bug
          Components: metadata, parser
    Affects Versions: 0.6
         Environment: Windows 7 64-bit
            Reporter: Gabriel Miklos
            Priority: Minor


The code exhibiting this issue is very simple:

        InputStream input = new FileInputStream(file);
        ContentHandler textHandler = new BodyContentHandler();
        tikaParser.parse(input, textHandler, metadata);
        input.close();
        System.out.println(metadata);

The output:
title=?a???▬÷&▼??♂?ŢjK???ž?↑M?A→<═]1
=╬\bK Author=═g?═?♦ Content-Type=application/pdf creator=?k?═?♦Ý`;Ý?)??/¶???Ě?3n
Î☼46ËO

Other than that, the extracted text is 100% correct.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message