lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: DataImportHandler - Unable to load Tika Config Processing Document # 1
Date Wed, 08 Feb 2017 14:45:46 GMT
On 2/6/2017 3:45 PM, Anatharaman, Srinatha (Contractor) wrote:
> I am having below error while trying to index using dataImporthandler
> Data-Config file is mentioned below. zookeeper is not able to read "tikaConfig.xml" on
below statement
>   processor="TikaEntityProcessor" tikaConfig="tikaConfig.xml"
> Please help me to resolve this issue
> ion: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to load Tika Config Processing Document # 1
> Caused by: ZkSolrResourceLoader does
not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper
>         at
>         at org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(
>         ... 11 more

This sounds to me like there's something making TikaEntityProcessor
incompatible with running in SolrCloud mode.  The way that this
processor loads its config appears to NOT work when the config comes
from zookeeper, which it always will when you're running SolrCloud.

I don't know if this is expected or not, or whether it will be
considered a bug.

It is *strongly* recommended to *not* use the Tika that's embedded
within Solr, but instead to do the processing outside of Solr in a
program of your own and index the results.  Tika is very touchy software
that sometimes hangs or crashes as it processes rich-text documents.  If
that happens to the embedded Tika, then Solr itself will also be affected.

Doing Tika processing outside of Solr is more important with SolrCloud,
because all replicas will need to independently index the data in cloud
mode.  Here's an archive of a message from this list about pretty much
the exact same problem:

Note that this message was sent only a week ago.


View raw message