From tika-dev-return-2813-apmail-lucene-tika-dev-archive=lucene.apache.org@lucene.apache.org Sun Dec 13 20:23:41 2009 Return-Path: Delivered-To: apmail-lucene-tika-dev-archive@www.apache.org Received: (qmail 73587 invoked from network); 13 Dec 2009 20:23:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Dec 2009 20:23:41 -0000 Received: (qmail 6858 invoked by uid 500); 13 Dec 2009 20:23:41 -0000 Delivered-To: apmail-lucene-tika-dev-archive@lucene.apache.org Received: (qmail 6747 invoked by uid 500); 13 Dec 2009 20:23:40 -0000 Mailing-List: contact tika-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: tika-dev@lucene.apache.org Delivered-To: mailing list tika-dev@lucene.apache.org Received: (qmail 6737 invoked by uid 99); 13 Dec 2009 20:23:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Dec 2009 20:23:40 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Dec 2009 20:23:38 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1CC20234C045 for ; Sun, 13 Dec 2009 12:23:18 -0800 (PST) Message-ID: <2118114348.1260735798103.JavaMail.jira@brutus> Date: Sun, 13 Dec 2009 20:23:18 +0000 (UTC) From: "Jukka Zitting (JIRA)" To: tika-dev@lucene.apache.org Subject: [jira] Updated: (TIKA-346) ZipParser throws "invalid compression method" error for some archives In-Reply-To: <478611142.1260449538266.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-346: ------------------------------- Attachment: TIKA-346.patch The attached patch fixes this problem after recent Commons Compress changes related to COMPRESS-93. We can apply the patch once Commons Compress 1.1 is available. > ZipParser throws "invalid compression method" error for some archives > --------------------------------------------------------------------- > > Key: TIKA-346 > URL: https://issues.apache.org/jira/browse/TIKA-346 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.5 > Environment: Windows XP, JVM 1.6.16 > Reporter: Robert Trickey > Attachments: moby.zip, TIKA-346.patch > > > This could be a bug in the underlying apache-commons code. When trying to parse the attached file to extract text content, an error is thrown with the following stacktrace: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.ZipParser@1b963c4 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) > at my.code.wherever..... > Caused by: java.lang.IllegalArgumentException: invalid compression method > at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209) > at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146) > at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188) > at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66) > at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) > ... 25 more > I have extracted the content of the zip and ran the autodetect parser against all content files without problems, so it is definitely the zip that is the problem. > The attached zip is from Project Gutenberg and hence public domain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.