tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Kalcher (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1093) [OfficeParser] NullPointerException
Date Mon, 18 Mar 2013 17:08:15 GMT
Martin Kalcher created TIKA-1093:
------------------------------------

             Summary: [OfficeParser] NullPointerException 
                 Key: TIKA-1093
                 URL: https://issues.apache.org/jira/browse/TIKA-1093
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.3
         Environment: % java -version
java version "1.7.0_17"
OpenJDK Runtime Environment (IcedTea7 2.3.8) (ArchLinux build 7.u17_2.3.8-1-x86_64)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
            Reporter: Martin Kalcher


OfficeParser throws a NullPointerException for a doc file.

% java -Djava.awt.headless=false -jar tika-app-1.3.jar -t < test.doc 
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@29a01add
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:139)
	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:400)
	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:112)
Caused by: java.lang.NullPointerException
	at org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(CharacterSprmUncompressor.java:48)
	at org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288)
	at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:121)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346)
	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:79)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	... 5 more

I can not share the doc file at the moment, but i will ask my clients if you need it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message