tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Juha Haaga (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1028) Tika-server quits parsing of rfc-822 document prematurely when it encounters encrypted zip file as attachment.
Date Wed, 21 Nov 2012 12:57:59 GMT
Juha Haaga created TIKA-1028:
--------------------------------

             Summary: Tika-server quits parsing of rfc-822 document prematurely when it encounters
encrypted zip file as attachment.
                 Key: TIKA-1028
                 URL: https://issues.apache.org/jira/browse/TIKA-1028
             Project: Tika
          Issue Type: Bug
          Components: mime, parser, server
    Affects Versions: 1.2, 1.3
            Reporter: Juha Haaga


The Zip parser in tika-server does not allow passing in the password for decrypting the zip
file and doesn't handle the unsupported feature gracefully. Problem happens when zip file
is attached part of email document being parsed, and the parser gives up and throws an exception:

WARNING: all: Unpacker failed
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pkg.PackageParser@10fcc945

Caused by: org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: unsupported
feature encryption used in entry

Instead of returning the successfully parsed components, Tika-server returns nothing. 

It would be better to return rest of the parsed document contents along with the untouched
offending zip file in the archive that Tika-server returns as a result. Until the feature
of zip file decrypting is added this would always return untouched zip file, and after it
is implemented it should return the untouched zip file in the cases where wrong password was
provided.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message