tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (TIKA-346) ZipParser throws "invalid compression method" error for some archives
Date Sun, 13 Dec 2009 20:23:18 GMT

     [ https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jukka Zitting updated TIKA-346:

    Attachment: TIKA-346.patch

The attached patch fixes this problem after recent Commons Compress changes related to COMPRESS-93.
We can apply the patch once Commons Compress 1.1 is available.

> ZipParser throws "invalid compression method" error for some archives
> ---------------------------------------------------------------------
>                 Key: TIKA-346
>                 URL: https://issues.apache.org/jira/browse/TIKA-346
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.5
>         Environment: Windows XP, JVM 1.6.16
>            Reporter: Robert Trickey
>         Attachments: moby.zip, TIKA-346.patch
> This could be a bug in the underlying apache-commons code. When trying to parse the attached
file to extract text content, an error is thrown with the following stacktrace:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.ZipParser@1b963c4
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> 	at my.code.wherever.....
> Caused by: java.lang.IllegalArgumentException: invalid compression method
> 	at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
> 	at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
> 	at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
> 	at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
> 	at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> 	... 25 more
> I have extracted the content of the zip and ran the autodetect parser against all content
files without problems, so it is definitely the zip that is the problem.
> The attached zip is from Project Gutenberg and hence public domain.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message