lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Kamerman <dami...@gmail.com>
Subject Re: Unable to index rich-text documents in Solr Cloud
Date Thu, 19 Mar 2015 02:56:03 GMT
I suggest you check your solr logs for more info as to the cause.

On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:

> Hi Erick,
>
> No, the PDF file is a testing file which only contains 1 sentence.
>
> I've managed to get it to work by removing startup="lazy" in
> the ExtractingRequestHandler and added the following lines:
>       <str name="uprefix">ignored_</str>
>       <str name="captureAttr">true</str>
>       <str name="fmap.a">links</str>
>       <str name="fmap.div">ignored_</str>
>
> Does the presence of startup="lazy" affect the function of
> ExtractingRequestHandler , or is it one of the str name values?
>
> Regards,
> Edwin
>
>
> On 18 March 2015 at 23:19, Erick Erickson <erickerickson@gmail.com> wrote:
>
> > Shot in the dark, but is the PDF file significantly larger than the
> > others? Perhaps your simply exceeding the packet limits for the
> > servlet container?
> >
> > Best,
> > Erick
> >
> > On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
> > <edwinyeozl@gmail.com> wrote:
> > > Hi everyone,
> > >
> > > I'm having some issues with indexing rich-text documents from the Solr
> > > Cloud. When I tried to index a pdf or word document, I get the
> following
> > > error:
> > >
> > >
> > > org.apache.solr.common.SolrException: Bad Request
> > >
> > >
> > >
> > > request:
> >
> http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
> > >         at
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
> > >         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > Source)
> > >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > Source)
> > >         at java.lang.Thread.run(Unknown Source)
> > >
> > >
> > > I'm able to index .xml and .csv files in Solr Cloud with the same
> > configuration.
> > >
> > > I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and
> > > I have 2 shards with the following details:
> > > Shard1: 192.168.2.2:8983
> > > Shard2: 192.168.2.2:8984
> > >
> > > Prior to this, I'm already able to index rich-text documents without
> > > the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
> > > so my ExtractRequestHandler is already defined.
> > >
> > > Is there other settings required in order to index rich-text documents
> > > in Solr Cloud?
> > >
> > >
> > > Regards,
> > > Edwin
> >
>



-- 
Damien Kamerman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message