tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Trickey (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-346) ZipParser throws "invalid compression method" error for some archives
Date Thu, 10 Dec 2009 12:52:18 GMT
ZipParser throws "invalid compression method" error for some archives
---------------------------------------------------------------------

                 Key: TIKA-346
                 URL: https://issues.apache.org/jira/browse/TIKA-346
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.5
         Environment: Windows XP, JVM 1.6.16
            Reporter: Robert Trickey
         Attachments: moby.zip

This could be a bug in the underlying apache-commons code. When trying to parse the attached
file to extract text content, an error is thrown with the following stacktrace:

org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.ZipParser@1b963c4
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
	at my.code.wherever.....
Caused by: java.lang.IllegalArgumentException: invalid compression method
	at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
	at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
	at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
	at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
	at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
	... 25 more

I have extracted the content of the zip and ran the autodetect parser against all content
files without problems, so it is definitely the zip that is the problem.

The attached zip is from Project Gutenberg and hence public domain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message