nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zara Parst <edotserv...@gmail.com>
Subject Re: Nutch/Solr communication problem
Date Mon, 18 Jan 2016 15:16:03 GMT
Mind to share that patch ?

On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma <markus.jelsma@openindex.io>
wrote:

> Yes i have used it, i made the damn patch myself years ago, and i used the
> same configuration. Command line or config work the same.
> Markus
>
> -----Original message-----
> From: Zara Parst<edotservice@gmail.com>
> Sent: Monday 18th January 2016 12:55
> To: dev@nutch.apache.org
> Subject: Re: Nutch/Solr communication problem
>
> Dear Markus,
>
> Are you just speaking blindly or what ?? My concern is did you ever try
> pushing index to solr which is password protected ? If yes can you just
> tell me what were the config you used , if you did that in config file then
> let me know or if you did through command then please let me know.
>
> thanks
>
> On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma <markus.jelsma@openindex.io
> <mailto:markus.jelsma@openindex.io>> wrote:
> Hi - This doesnt look like a HTTP basic authentication problem. Are you
> running Solr 5.x?
>
> Markus
>
> -----Original message-----
>
> From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com>>
>
> Sent: Monday 18th January 2016 11:55
>
> To: dev@nutch.apache.org <mailto:dev@nutch.apache.org>
>
> Subject: Re: Nutch/Solr communication problem
>
> SolrIndexWriter
>
>         solr.server.type : Type of SolrServer to communicate with (default
> http however options include cloud, lb and concurrent)
>
>         solr.server.url : URL of the Solr instance (mandatory)
>
>         solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud
> value for solr.server.type)
>
>         solr.loadbalance.urls : Comma-separated string of Solr server
> strings to be used (madatory if lb value for solr.server.type)
>
>         solr.mapping.file : name of the mapping file for fields (default
> solrindex-mapping.xml)
>
>         solr.commit.size : buffer size when sending to Solr (default 1000)
>
>         solr.auth : use authentication (default false)
>
>         solr.auth.username : username for authentication
>
>         solr.auth.password : password for authentication
>
> 2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
> crawldb: crawlDbyah/crawldb
>
> 2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
> linkdb: crawlDbyah/linkdb
>
> 2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces: adding segment: crawlDbyah/segments/20160117191906
>
> 2016-01-17 19:19:42,975 WARN  indexer.IndexerMapReduce - Ignoring linkDb
> for indexing, no linkDb found in path: crawlDbyah/linkdb
>
> 2016-01-17 19:19:43,807 WARN  conf.Configuration -
> file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
>
> 2016-01-17 19:19:43,809 WARN  conf.Configuration -
> file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.attempts;  Ignoring.
>
> 2016-01-17 19:19:43,963 WARN  conf.Configuration -
> file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
>
> 2016-01-17 19:19:43,980 WARN  conf.Configuration -
> file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.attempts;  Ignoring.
>
> 2016-01-17 19:19:44,260 INFO  anchor.AnchorIndexingFilter - Anchor
> deduplication is: off
>
> 2016-01-17 19:19:45,128 INFO  indexer.IndexWriters - Adding
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>
> 2016-01-17 19:19:45,148 INFO  solr.SolrUtils - Authenticating as: radmin
>
> 2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: content
> dest: content
>
> 2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: title dest:
> title
>
> 2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: host dest:
> host
>
> 2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: segment
> dest: segment
>
> 2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: boost dest:
> boost
>
> 2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: digest
> dest: digest
>
> 2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: tstamp
> dest: tstamp
>
> 2016-01-17 19:19:45,360 INFO  solr.SolrIndexWriter - Indexing 2 documents
>
> 2016-01-17 19:19:45,507 INFO  solr.SolrIndexWriter - Indexing 2 documents
>
> 2016-01-17 19:19:45,526 WARN  mapred.LocalJobRunner -
> job_local2114349538_0001
>
> java.lang.Exception: java.io.IOException
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>
> Caused by: java.io.IOException
>
>         at
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171)
>
>         at
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157)
>
>         at
> org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
>
>         at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
>
>         at
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502)
>
>         at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456)
>
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
> occured when talking to server at: http://127.0.0.1:8983/solr/yah <
> http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah <
> http://127.0.0.1:8983/solr/yah>>
>
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
>
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
>
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
>
>         at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
>
>         at
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)
>
>         ... 11 more
>
> Caused by: org.apache.http.client.ClientProtocolException
>
>         at
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
>
>         at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>
>         at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>
>         at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
>
>         ... 15 more
>
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity.
>
>         at
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208)
>
>         at
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
>
>         at
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
>
>         at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
>
>         at
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
>
>         ... 19 more
>
> 2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer:
> java.io.IOException: Job failed!
>
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
>
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
>
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
>
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
>
> On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <markus.jelsma@openindex.io
> <mailto:markus.jelsma@openindex.io> <mailto:markus.jelsma@openindex.io
> <mailto:markus.jelsma@openindex.io>>> wrote:
>
> Hi - can you post the log output?
>
> Markus
>
> -----Original message-----
>
> From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com>
> <mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>>>
>
> Sent: Monday 18th January 2016 2:06
>
> To: dev@nutch.apache.org <mailto:dev@nutch.apache.org> <mailto:
> dev@nutch.apache.org <mailto:dev@nutch.apache.org>>
>
> Subject: Nutch/Solr communication problem
>
> Hi everyone,
>
> I have situation here, I am using nutch 1.11 and solr 5.4
>
> Solr is protected by user name and password  I am passing credential to
> solr using following command
>
> bin/crawl -i -Dsolr.server.url=http://localhost:8983/solr/abc <
> http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <
> http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <
> http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <
> http://localhost:8983/solr/abc>>>  -D solr.auth=true
>  -Dsolr.auth.username=xxxx  -Dsolr.auth.password=xxx  url crawlDbyah 1
>
> and always same problem , please help me how to feed data to protected
> solr.
>
> Below is error message.
>
> Indexer: starting at 2016-01-17 19:01:12
>
> Indexer: deleting gone documents: false
>
> Indexer: URL filtering: false
>
> Indexer: URL normalizing: false
>
> Active IndexWriters :
>
> SolrIndexWriter
>
>         solr.server.type : Type of SolrServer to communicate with (default
> http however options include cloud, lb and concurrent)
>
>         solr.server.url : URL of the Solr instance (mandatory)
>
>         solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud
> value for solr.server.type)
>
>         solr.loadbalance.urls : Comma-separated string of Solr server
> strings to be used (madatory if lb value for solr.server.type)
>
>         solr.mapping.file : name of the mapping file for fields (default
> solrindex-mapping.xml)
>
>         solr.commit.size : buffer size when sending to Solr (default 1000)
>
>         solr.auth : use authentication (default false)
>
>         solr.auth.username : username for authentication
>
>         solr.auth.password : password for authentication
>
> Indexing 2 documents
>
> Indexing 2 documents
>
> Indexer: java.io.IOException: Job failed!
>
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
>
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
>
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
>
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
>
> I also tried username and password in nutch-default.xml but again same
> error. Please help me out.
>
>
>

Mime
View raw message