lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anatharaman, Srinatha (Contractor)" <Srinatha_Ananthara...@comcast.com>
Subject RE: DataImportHandler - Unable to load Tika Config Processing Document # 1
Date Wed, 08 Feb 2017 20:22:46 GMT
Shawn,

Thank you I will follow Erick's steps
BTW I am also trying to ingesting using Flume , Flume uses Morphlines along with Tika
Even Flume SolrSink will have the same issue?

Currently my SolrSink does not ingest the data and also I do not see any error in my logs.
I am seeing lot of issues with Solr

Could you please suggest me what could be the issue with my Flume SolrSink?

I have attached my another email sent on SolrSink issue

Regards,
~Sri

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Wednesday, February 08, 2017 2:21 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document # 1

On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote:
> Thank you for your reply
> Other archive message you mentioned is posted by me only I am new to 
> Solr, When you say process outside Solr program. What exactly I should do?
>
> I am having lots of text document which I need to index, what should I apply to these
document before loading it to Solr?

Did you not see Erick's reply, where he provided the following link, and said that the program
shown there was a decent guide to writing your own program to handle Tika processing?

https://lucidworks.com/2012/02/14/indexing-with-solrj/

The blog post includes code that talks to a database, which would be fairly easy to remove/change.
 Some knowledge of how to write Java programs is required.  Tika is a Java API, so writing
the program in Java is a prerequisite.

The entire point of this idea is to take the Tika processing out of the Solr server(s).  If
Tika runs within Solr, it can cause Solr to hang or crash.  The authors of Tika try as hard
as they can to make sure it works well, but the software is dealing with proprietary data
formats that are not publicly documented.  Sometimes one of those documents can cause Tika
to explode.  Crashes in client code won't break your application, and it is likely easier
to recover from a crash at that level.

Thanks,
Shawn


Mime
View raw message