tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Tika 0.7 And Solr
Date Wed, 07 Jul 2010 16:44:42 GMT
Hi Rohan,

On Jul 7, 2010, at 4:01am, rohanpatil wrote:

> I am using Solr provided by lucidimagination and it has tika 0.5 and  
> uses
> pdfbox 0.8.
> And it has problems extracting content from large(>200kb) v1.5 PDFs.
>
> I saw that pdfbox 1.x resolves this issue.
> When i upgraded the extraction jars i got the following errors.
>
> Jul 7, 2010 2:38:56 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider

Back in January I'd run into the same issue:

> I believe the issue is that the PDFBox pom.xml declares the  
> dependency on the missing BouncyCastleProvider jar as "optional".
>
>    <dependency>
>      <groupId>bouncycastle</groupId>
>      <artifactId>bcprov-jdk14</artifactId>
>      <version>136</version>
>      <optional>true</optional>
>    </dependency>
>
> As explained in the Maven documentation, this means that Tika needs  
> to explicitly include the jar:
>
> http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html
>
> I see a few other optional dependencies in the PDFBox pom.xml, but  
> perhaps the only one that's really critical is the above.
>
> Let me know if anybody else has input on this, otherwise I'll file  
> an issue and fix it.

To fix it, you could manually install the bcprov-jdk14.jar

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message