Hi
the steps shown @
http://wiki.apache.org/solr/ExtractingRequestHandler#Upgrading_Tika
worked for me. I use the following versions for extracting MS-Office and
PDF files:
tika 0.9
pdfbox 1.6.0
poi 3.8beta3
This combination turned out to be the most failsafe at the moment.
Cheers
-Tom
On 08/19/2011 01:44 PM, nirnaydewan wrote:
> As in Tika 0.9, the formatting issue for extracting content from PDF& DOC
> files have been fixed, i want to integrate this in my existing Solr project.
>
> Please let me know the steps.
>
> All i have is the downloaded folder of Solr 3.3.0 and currently using the
> attached Jetty server only. This version contains Tika 0.8 version.
>
> All i need in simple steps as to how to replace 0.8 with 0.9
>
>
>
> Thanks
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Tika-0-9-integration-in-Solr-3-3-0-tp3267799p3267799.html
> Sent from the Apache Tika - Development mailing list archive at Nabble.com.
>
--
Author of the book "Plone 3 Multimedia" - http://amzn.to/dtrp0C
Tom Gross
email..........tom@toms-projekte.de
skype.....................tom_gross
web.........http://toms-projekte.de
blog...http://blog.toms-projekte.de
|