nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Quick <>
Subject problems parsing pdf's
Date Sun, 07 Sep 2008 20:59:51 GMT


I keep getting the following errors when parsing pdf's:

Error parsing:$FILE/Three+wishes.pdf:
failed(2,0): Can't be handled as pdf document. java.lang.ClassCastException: org.pdfbox.pdmodel.encryption.PDEncryptionDictionary

fetch of$FILE/BAUWS.pdf
failed with: java.lang.NoClassDefFoundError: javax/media/jai/PlanarImage

I have applied the patch mentioned here=>
but this didn't stop the ClassCastExceptions for everything.

Currently I've got about 243 pdfs on our Intranet which I cant get Nutch to parse :-(



Make a mini you and download it into Windows Live Messenger
View raw message