tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Jpeg parsing issues
Date Tue, 07 Sep 2010 03:03:03 GMT
Hi devs,

I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a  
number of documents now fail during parsing that previously passed.

Many of these failures seem related to image processing. For example:

Caused by: org.apache.tika.exception.TikaException: Can't read JPEG  
metadata
	at  
org 
.apache 
.tika 
.parser 
.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:71)
	at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
163)
	at  
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:110)
	at bixo.parser.TikaCallable.call(TikaCallable.java:63)
	at bixo.parser.TikaCallable.call(TikaCallable.java:1)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.lang.Thread.run(Thread.java:637)
Caused by: com.drew.imaging.jpeg.JpegProcessingException: not a jpeg  
file
	at com.drew.imaging.jpeg.JpegSegmentReader.readSegments(Unknown Source)
	at com.drew.imaging.jpeg.JpegSegmentReader.<init>(Unknown Source)
	at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(Unknown  
Source)
	at  
org 
.apache 
.tika 
.parser 
.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:67)
	... 8 more

Did the Tika-0.7 image parsers (JPEG, GIF, PNG) not extract metadata,  
and thus not run into these types of issues?

Thanks,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Mime
View raw message