tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Trickey (JIRA)" <j...@apache.org>
Subject [jira] Updated: (TIKA-346) ZipParser throws "invalid compression method" error for some archives
Date Thu, 10 Dec 2009 12:52:18 GMT

     [ https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Trickey updated TIKA-346:
--------------------------------

    Attachment: moby.zip

> ZipParser throws "invalid compression method" error for some archives
> ---------------------------------------------------------------------
>
>                 Key: TIKA-346
>                 URL: https://issues.apache.org/jira/browse/TIKA-346
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.5
>         Environment: Windows XP, JVM 1.6.16
>            Reporter: Robert Trickey
>         Attachments: moby.zip
>
>
> This could be a bug in the underlying apache-commons code. When trying to parse the attached
file to extract text content, an error is thrown with the following stacktrace:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.ZipParser@1b963c4
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> 	at my.code.wherever.....
> Caused by: java.lang.IllegalArgumentException: invalid compression method
> 	at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
> 	at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
> 	at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
> 	at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
> 	at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> 	... 25 more
> I have extracted the content of the zip and ran the autodetect parser against all content
files without problems, so it is definitely the zip that is the problem.
> The attached zip is from Project Gutenberg and hence public domain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message