lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anatharaman, Srinatha (Contractor)" <Srinatha_Ananthara...@comcast.com>
Subject RE: DataImportHandler - Unable to load Tika Config Processing Document # 1
Date Wed, 08 Feb 2017 16:08:59 GMT
Shawn,

Thank you for your reply
Other archive message you mentioned is posted by me only
I am new to Solr, When you say process outside Solr program. What exactly I should do?

I am having lots of text document which I need to index, what should I apply to these document
before loading it to Solr?

Regards,
~Sri


-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Wednesday, February 08, 2017 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document # 1

On 2/6/2017 3:45 PM, Anatharaman, Srinatha (Contractor) wrote:
> I am having below error while trying to index using dataImporthandler
>
> Data-Config file is mentioned below. zookeeper is not able to read 
> "tikaConfig.xml" on below statement
>
>   processor="TikaEntityProcessor" tikaConfig="tikaConfig.xml"
>
> Please help me to resolve this issue
>
> ion: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable 
> to load Tika Config Processing Document # 1
<snip>
> Caused by: org.apache.solr.common.cloud.ZooKeeperException: ZkSolrResourceLoader does
not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper
mode
>         at org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:149)
>         at org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91)
>         ... 11 more

This sounds to me like there's something making TikaEntityProcessor incompatible with running
in SolrCloud mode.  The way that this processor loads its config appears to NOT work when
the config comes from zookeeper, which it always will when you're running SolrCloud.

I don't know if this is expected or not, or whether it will be considered a bug.

It is *strongly* recommended to *not* use the Tika that's embedded within Solr, but instead
to do the processing outside of Solr in a program of your own and index the results.  Tika
is very touchy software that sometimes hangs or crashes as it processes rich-text documents.
 If that happens to the embedded Tika, then Solr itself will also be affected.

Doing Tika processing outside of Solr is more important with SolrCloud, because all replicas
will need to independently index the data in cloud mode.  Here's an archive of a message from
this list about pretty much the exact same problem:

https://www.mail-archive.com/solr-user@lucene.apache.org/msg127924.html

Note that this message was sent only a week ago.

Thanks,
Shawn


Mime
View raw message