lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Unable to index rich-text documents in Solr Cloud
Date Thu, 19 Mar 2015 01:58:54 GMT
Hi Erick,

No, the PDF file is a testing file which only contains 1 sentence.

I've managed to get it to work by removing startup="lazy" in
the ExtractingRequestHandler and added the following lines:
      <str name="uprefix">ignored_</str>
      <str name="captureAttr">true</str>
      <str name="fmap.a">links</str>
      <str name="fmap.div">ignored_</str>

Does the presence of startup="lazy" affect the function of
ExtractingRequestHandler , or is it one of the str name values?

Regards,
Edwin


On 18 March 2015 at 23:19, Erick Erickson <erickerickson@gmail.com> wrote:

> Shot in the dark, but is the PDF file significantly larger than the
> others? Perhaps your simply exceeding the packet limits for the
> servlet container?
>
> Best,
> Erick
>
> On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
> <edwinyeozl@gmail.com> wrote:
> > Hi everyone,
> >
> > I'm having some issues with indexing rich-text documents from the Solr
> > Cloud. When I tried to index a pdf or word document, I get the following
> > error:
> >
> >
> > org.apache.solr.common.SolrException: Bad Request
> >
> >
> >
> > request:
> http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
> >         at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
> >         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> >         at java.lang.Thread.run(Unknown Source)
> >
> >
> > I'm able to index .xml and .csv files in Solr Cloud with the same
> configuration.
> >
> > I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and
> > I have 2 shards with the following details:
> > Shard1: 192.168.2.2:8983
> > Shard2: 192.168.2.2:8984
> >
> > Prior to this, I'm already able to index rich-text documents without
> > the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml,
> > so my ExtractRequestHandler is already defined.
> >
> > Is there other settings required in order to index rich-text documents
> > in Solr Cloud?
> >
> >
> > Regards,
> > Edwin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message