nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roannel Fernández Hernández <roan...@uci.cu>
Subject Re: [MASSMAIL]Re: Nutch/Solr communication problem
Date Tue, 19 Jan 2016 14:14:12 GMT
Hi 

I think that your problem is not related with Solr authentication. The fields of documents
sent by you to Solr and the fields defined in Solr schema are differents. Perhaps the Nutch
document has a multivalued field defined in Solr schema as simple field, or in Solr schema
there is a required field not sent by Nutch or the primary key has not been sent or ... 

Just to confirm that and if it's possible you can remove the Solr protection and try it again.
If you get the same error, then it is not related with Solr authentication and you have to
check the fields sent to Solr. 

Regards 

----- Original Message -----

> From: "Zara Parst" <edotservice@gmail.com>
> To: dev@nutch.apache.org
> Sent: Monday, January 18, 2016 3:28:29 PM
> Subject: [MASSMAIL]Re: Nutch/Solr communication problem

> I am using solr 5.4 and nutch 1.11

> On Tue, Jan 19, 2016 at 1:46 AM, Markus Jelsma < markus.jelsma@openindex.io >
> wrote:

> > Hi - it was an answer to your question whether i have ever used it. Yes, i
> > patched and committed it. And therefore i asked if you're using Solr 5 or
> > not. So again, are you using Solr 5?
> 

> > Markus
> 

> > -----Original message-----
> 
> > From: Zara Parst< edotservice@gmail.com >
> 
> > Sent: Monday 18th January 2016 16:16
> 
> > To: dev@nutch.apache.org
> 
> > Subject: Re: Nutch/Solr communication problem
> 

> > Mind to share that patch ?
> 

> > On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma < markus.jelsma@openindex.io
> > <mailto: markus.jelsma@openindex.io >> wrote:
> 
> > Yes i have used it, i made the damn patch myself years ago, and i used the
> > same configuration. Command line or config work the same.
> 

> > Markus
> 

> > -----Original message-----
> 

> > From: Zara Parst< edotservice@gmail.com <mailto: edotservice@gmail.com >>
> 

> > Sent: Monday 18th January 2016 12:55
> 

> > To: dev@nutch.apache.org <mailto: dev@nutch.apache.org >
> 

> > Subject: Re: Nutch/Solr communication problem
> 

> > Dear Markus,
> 

> > Are you just speaking blindly or what ?? My concern is did you ever try
> > pushing index to solr which is password protected ? If yes can you just
> > tell
> > me what were the config you used , if you did that in config file then let
> > me know or if you did through command then please let me know.
> 

> > thanks
> 

> > On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma < markus.jelsma@openindex.io
> > <mailto: markus.jelsma@openindex.io > <mailto: markus.jelsma@openindex.io
> > <mailto: markus.jelsma@openindex.io >>> wrote:
> 

> > Hi - This doesnt look like a HTTP basic authentication problem. Are you
> > running Solr 5.x?
> 

> > Markus
> 

> > -----Original message-----
> 

> > From: Zara Parst< edotservice@gmail.com <mailto: edotservice@gmail.com >
> > <mailto: edotservice@gmail.com <mailto: edotservice@gmail.com >>>
> 

> > Sent: Monday 18th January 2016 11:55
> 

> > To: dev@nutch.apache.org <mailto: dev@nutch.apache.org > <mailto:
> > dev@nutch.apache.org <mailto: dev@nutch.apache.org >>
> 

> > Subject: Re: Nutch/Solr communication problem
> 

> > SolrIndexWriter
> 

> > solr.server.type : Type of SolrServer to communicate with (default http
> > however options include cloud, lb and concurrent)
> 

> > solr.server.url : URL of the Solr instance (mandatory)
> 

> > solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for
> > solr.server.type)
> 

> > solr.loadbalance.urls : Comma-separated string of Solr server strings to be
> > used (madatory if lb value for solr.server.type)
> 

> > solr.mapping.file : name of the mapping file for fields (default
> > solrindex-mapping.xml)
> 

> > solr.commit.size : buffer size when sending to Solr (default 1000)
> 

> > solr.auth : use authentication (default false)
> 

> > solr.auth.username : username for authentication
> 

> > solr.auth.password : password for authentication
> 

> > 2016-01-17 19:19:42,973 INFO indexer.IndexerMapReduce - IndexerMapReduce:
> > crawldb: crawlDbyah/crawldb
> 

> > 2016-01-17 19:19:42,973 INFO indexer.IndexerMapReduce - IndexerMapReduce:
> > linkdb: crawlDbyah/linkdb
> 

> > 2016-01-17 19:19:42,973 INFO indexer.IndexerMapReduce - IndexerMapReduces:
> > adding segment: crawlDbyah/segments/20160117191906
> 

> > 2016-01-17 19:19:42,975 WARN indexer.IndexerMapReduce - Ignoring linkDb for
> > indexing, no linkDb found in path: crawlDbyah/linkdb
> 

> > 2016-01-17 19:19:43,807 WARN conf.Configuration -
> > file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
> > attempt to override final parameter:
> > mapreduce.job.end-notification.max.retry.interval; Ignoring.
> 

> > 2016-01-17 19:19:43,809 WARN conf.Configuration -
> > file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
> > attempt to override final parameter:
> > mapreduce.job.end-notification.max.attempts; Ignoring.
> 

> > 2016-01-17 19:19:43,963 WARN conf.Configuration -
> > file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
> > attempt to override final parameter:
> > mapreduce.job.end-notification.max.retry.interval; Ignoring.
> 

> > 2016-01-17 19:19:43,980 WARN conf.Configuration -
> > file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
> > attempt to override final parameter:
> > mapreduce.job.end-notification.max.attempts; Ignoring.
> 

> > 2016-01-17 19:19:44,260 INFO anchor.AnchorIndexingFilter - Anchor
> > deduplication is: off
> 

> > 2016-01-17 19:19:45,128 INFO indexer.IndexWriters - Adding
> > org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 

> > 2016-01-17 19:19:45,148 INFO solr.SolrUtils - Authenticating as: radmin
> 

> > 2016-01-17 19:19:45,318 INFO solr.SolrMappingReader - source: content dest:
> > content
> 

> > 2016-01-17 19:19:45,318 INFO solr.SolrMappingReader - source: title dest:
> > title
> 

> > 2016-01-17 19:19:45,318 INFO solr.SolrMappingReader - source: host dest:
> > host
> 

> > 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: segment dest:
> > segment
> 

> > 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: boost dest:
> > boost
> 

> > 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: digest dest:
> > digest
> 

> > 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: tstamp dest:
> > tstamp
> 

> > 2016-01-17 19:19:45,360 INFO solr.SolrIndexWriter - Indexing 2 documents
> 

> > 2016-01-17 19:19:45,507 INFO solr.SolrIndexWriter - Indexing 2 documents
> 

> > 2016-01-17 19:19:45,526 WARN mapred.LocalJobRunner -
> > job_local2114349538_0001
> 

> > java.lang.Exception: java.io.IOException
> 

> > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> 

> > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> 

> > Caused by: java.io.IOException
> 

> > at
> > org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171)
> 

> > at
> > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157)
> 

> > at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
> 

> > at
> > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
> 

> > at
> > org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502)
> 

> > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456)
> 

> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> 

> > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
> 

> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 

> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 

> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 

> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 

> > at java.lang.Thread.run(Thread.java:745)
> 

> > Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
> > occured when talking to server at: http://127.0.0.1:8983/solr/yah <
> > http://127.0.0.1:8983/solr/yah > < http://127.0.0.1:8983/solr/yah <
> > http://127.0.0.1:8983/solr/yah >> < http://127.0.0.1:8983/solr/yah <
> > http://127.0.0.1:8983/solr/yah > < http://127.0.0.1:8983/solr/yah <
> > http://127.0.0.1:8983/solr/yah >>>
> 

> > at
> > org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
> 

> > at
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
> 

> > at
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
> 

> > at
> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> 

> > at
> > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)
> 

> > ... 11 more
> 

> > Caused by: org.apache.http.client.ClientProtocolException
> 

> > at
> > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
> 

> > at
> > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> 

> > at
> > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
> 

> > at
> > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
> 

> > at
> > org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
> 

> > ... 15 more
> 

> > Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> > retry
> > request with a non-repeatable request entity.
> 

> > at
> > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208)
> 

> > at
> > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
> 

> > at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
> 

> > at
> > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
> 

> > at
> > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
> 

> > ... 19 more
> 

> > 2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer:
> > java.io.IOException: Job failed!
> 

> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> 

> > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> 

> > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
> 

> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 

> > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
> 

> > On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma < markus.jelsma@openindex.io
> > <mailto: markus.jelsma@openindex.io > <mailto: markus.jelsma@openindex.io
> > <mailto: markus.jelsma@openindex.io >> <mailto: markus.jelsma@openindex.io
> > <mailto: markus.jelsma@openindex.io > <mailto: markus.jelsma@openindex.io
> > <mailto: markus.jelsma@openindex.io >>>> wrote:
> 

> > Hi - can you post the log output?
> 

> > Markus
> 

> > -----Original message-----
> 

> > From: Zara Parst< edotservice@gmail.com <mailto: edotservice@gmail.com >
> > <mailto: edotservice@gmail.com <mailto: edotservice@gmail.com >> <mailto:
> > edotservice@gmail.com <mailto: edotservice@gmail.com > <mailto:
> > edotservice@gmail.com <mailto: edotservice@gmail.com >>>>
> 

> > Sent: Monday 18th January 2016 2:06
> 

> > To: dev@nutch.apache.org <mailto: dev@nutch.apache.org > <mailto:
> > dev@nutch.apache.org <mailto: dev@nutch.apache.org >> <mailto:
> > dev@nutch.apache.org <mailto: dev@nutch.apache.org > <mailto:
> > dev@nutch.apache.org <mailto: dev@nutch.apache.org >>>
> 

> > Subject: Nutch/Solr communication problem
> 

> > Hi everyone,
> 

> > I have situation here, I am using nutch 1.11 and solr 5.4
> 

> > Solr is protected by user name and password I am passing credential to solr
> > using following command
> 

> > bin/crawl -i -Dsolr.server.url= http://localhost:8983/solr/abc <
> > http://localhost:8983/solr/abc > < http://localhost:8983/solr/abc <
> > http://localhost:8983/solr/abc >> < http://localhost:8983/solr/abc <
> > http://localhost:8983/solr/abc > < http://localhost:8983/solr/abc <
> > http://localhost:8983/solr/abc >>> < http://localhost:8983/solr/abc
<
> > http://localhost:8983/solr/abc > < http://localhost:8983/solr/abc <
> > http://localhost:8983/solr/abc >> < http://localhost:8983/solr/abc <
> > http://localhost:8983/solr/abc > < http://localhost:8983/solr/abc <
> > http://localhost:8983/solr/abc >>>> -D solr.auth=true
> > -Dsolr.auth.username=xxxx -Dsolr.auth.password=xxx url crawlDbyah 1
> 

> > and always same problem , please help me how to feed data to protected
> > solr.
> 

> > Below is error message.
> 

> > Indexer: starting at 2016-01-17 19:01:12
> 

> > Indexer: deleting gone documents: false
> 

> > Indexer: URL filtering: false
> 

> > Indexer: URL normalizing: false
> 

> > Active IndexWriters :
> 

> > SolrIndexWriter
> 

> > solr.server.type : Type of SolrServer to communicate with (default http
> > however options include cloud, lb and concurrent)
> 

> > solr.server.url : URL of the Solr instance (mandatory)
> 

> > solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for
> > solr.server.type)
> 

> > solr.loadbalance.urls : Comma-separated string of Solr server strings to be
> > used (madatory if lb value for solr.server.type)
> 

> > solr.mapping.file : name of the mapping file for fields (default
> > solrindex-mapping.xml)
> 

> > solr.commit.size : buffer size when sending to Solr (default 1000)
> 

> > solr.auth : use authentication (default false)
> 

> > solr.auth.username : username for authentication
> 

> > solr.auth.password : password for authentication
> 

> > Indexing 2 documents
> 

> > Indexing 2 documents
> 

> > Indexer: java.io.IOException: Job failed!
> 

> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> 

> > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> 

> > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
> 

> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 

> > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
> 

> > I also tried username and password in nutch-default.xml but again same
> > error.
> > Please help me out.
> 

Mime
View raw message