lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Kamerman <dami...@gmail.com>
Subject Re: Unable to index rich-text documents in Solr Cloud
Date Thu, 19 Mar 2015 04:49:19 GMT
It sounds like https://issues.apache.org/jira/browse/SOLR-5551
Have you checked the solr.log for all nodes?

On 19 March 2015 at 14:43, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:

> This is the logs that I got from solr.log. I can't seems to figure out
> what's wrong with it. Does anyone knows?
>
>
>
> ERROR - 2015-03-18 15:06:51.019;
> org.apache.solr.update.StreamingSolrClients$1; error
> org.apache.solr.common.SolrException: Bad Request
>
>
>
> request:
>
> http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
> <
> http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
> >
> at
>
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> INFO  - 2015-03-18 15:06:51.019;
> org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
> path=/update/extract params={literal.id
> =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf&resource.name
> =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf}
> {add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252
> INFO  - 2015-03-18 15:06:51.029;
> org.apache.solr.update.DirectUpdateHandler2; start
>
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> INFO  - 2015-03-18 15:06:51.029;
> org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
> Skipping IW.commit.
> INFO  - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore;
> SolrIndexSearcher has not changed - not re-opening:
> org.apache.solr.search.SolrIndexSearcher
> INFO  - 2015-03-18 15:06:51.039;
> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
> INFO  - 2015-03-18 15:06:51.039;
> org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
> path=/update params={waitSearcher=true&distrib.from=
>
> http://192.168.2.2:8983/solr/logmill/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false
> }
> {commit=} 0 10
> INFO  - 2015-03-18 15:06:51.039;
> org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr
> path=/update params={commit=true} {commit=} 0 10
>
>
>
> Regards,
> Edwin
>
>
> On 19 March 2015 at 10:56, Damien Kamerman <damienk@gmail.com> wrote:
>
> > I suggest you check your solr logs for more info as to the cause.
> >
> > On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> > wrote:
> >
> > > Hi Erick,
> > >
> > > No, the PDF file is a testing file which only contains 1 sentence.
> > >
> > > I've managed to get it to work by removing startup="lazy" in
> > > the ExtractingRequestHandler and added the following lines:
> > >       <str name="uprefix">ignored_</str>
> > >       <str name="captureAttr">true</str>
> > >       <str name="fmap.a">links</str>
> > >       <str name="fmap.div">ignored_</str>
> > >
> > > Does the presence of startup="lazy" affect the function of
> > > ExtractingRequestHandler , or is it one of the str name values?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 18 March 2015 at 23:19, Erick Erickson <erickerickson@gmail.com>
> > wrote:
> > >
> > > > Shot in the dark, but is the PDF file significantly larger than the
> > > > others? Perhaps your simply exceeding the packet limits for the
> > > > servlet container?
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo
> > > > <edwinyeozl@gmail.com> wrote:
> > > > > Hi everyone,
> > > > >
> > > > > I'm having some issues with indexing rich-text documents from the
> > Solr
> > > > > Cloud. When I tried to index a pdf or word document, I get the
> > > following
> > > > > error:
> > > > >
> > > > >
> > > > > org.apache.solr.common.SolrException: Bad Request
> > > > >
> > > > >
> > > > >
> > > > > request:
> > > >
> > >
> >
> http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2F&wt=javabin&version=2
> > > > >         at
> > > >
> > >
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241)
> > > > >         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> > > > Source)
> > > > >         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> > > > Source)
> > > > >         at java.lang.Thread.run(Unknown Source)
> > > > >
> > > > >
> > > > > I'm able to index .xml and .csv files in Solr Cloud with the same
> > > > configuration.
> > > > >
> > > > > I have setup Solr Cloud using the default zookeeper in Solr 5.0.0,
> > and
> > > > > I have 2 shards with the following details:
> > > > > Shard1: 192.168.2.2:8983
> > > > > Shard2: 192.168.2.2:8984
> > > > >
> > > > > Prior to this, I'm already able to index rich-text documents
> without
> > > > > the Solr Cloud, and I'm using the same solrconfig.xml and
> schema.xml,
> > > > > so my ExtractRequestHandler is already defined.
> > > > >
> > > > > Is there other settings required in order to index rich-text
> > documents
> > > > > in Solr Cloud?
> > > > >
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > >
> > >
> >
> >
> >
> > --
> > Damien Kamerman
> >
>



-- 
Damien Kamerman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message