manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bisonti Mario <>
Subject R: How to set Tika with ManifoldCF and Solr
Date Thu, 11 Oct 2018 11:10:20 GMT
Thanks Karl.
I tried, but it doesn’t index documents.
It seemes that it doesn’t see them?

Perhaps is the “Ignore Tika exception that I don’t know where to set in ManifoldCF  the

Da: Karl Wright <>
Inviato: giovedì 11 ottobre 2018 12:24
Oggetto: Re: How to set Tika with ManifoldCF and Solr

Hi Mario,

(1) When you use the Tika server externally, you do not get the boilerpipe HTML extractor
available for configuration and use.  That is because it's external now.
(2) In your Solr connection, you want to uncheck the box that says "use extracting update
handler", and you want to change the output handler from "/update/extract" to just "/update".


On Thu, Oct 11, 2018 at 4:45 AM Bisonti Mario <<>>
I would like to use Tika server started from command line into ManifoldCF so, ManifoldCF as
Trasformation connector, process with Tika and index to the output connecto Solr.

I started Tika server:
java -jar /opt/tika/tika-server-1.19.1.jar

After, I created a transformation connection with TikaServer: localhost and Tika port 998
and connection works.

After, I created a job and in the Tab Connection I inserted the Transformation yet created
Before the Output Solr.


Note that I don’t see the tab “Excepition” and “Boilerplate”
Why this?

Furthermore, if I start the job, I see that Solr hangs with exception:
2018-10-11 10:03:47.268 WARN  (qtp1223240796-17) [   x:core_share] o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
        at java.lang.Class.forName0(Native Method) ~[?:?]
        at java.lang.Class.forName( ~[?:?]

infact, I renamed the tika .jar:
in the folder : solr/contrib/extraction/lib to be sure that solr doesn’t use Tika because
I would like that Manifoldcfuses Tika buti t doesn’t work.

Have I to configure solr to don’t use Tika I suppose.

How to do this?

I see<>
but I haven’t Datafari, so, in a Solr standard configuration, how could I deactivated the
tika ?

Thanks a lot


View raw message