nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Nutch/Solr communication problem
Date Mon, 18 Jan 2016 20:16:38 GMT
Hi - it was an answer to your question whether i have ever used it. Yes, i patched and committed
it. And therefore i asked if you're using Solr 5 or not. So again, are you using Solr 5?

Markus


-----Original message-----
From: Zara Parst<edotservice@gmail.com>
Sent: Monday 18th January 2016 16:16
To: dev@nutch.apache.org
Subject: Re: Nutch/Solr communication problem

Mind to share that patch ?

On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma <markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>
wrote:
Yes i have used it, i made the damn patch myself years ago, and i used the same configuration.
Command line or config work the same.

Markus

-----Original message-----

From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com>>

Sent: Monday 18th January 2016 12:55

To: dev@nutch.apache.org <mailto:dev@nutch.apache.org>

Subject: Re: Nutch/Solr communication problem

Dear Markus,

Are you just speaking blindly or what ?? My concern is did you ever try pushing index to solr
which is password protected ? If yes can you just tell me what were the config you used ,
if you did that in config file then let me know or if you did through command then please
let me know.

thanks

On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma <markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>
<mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>> wrote:

Hi - This doesnt look like a HTTP basic authentication problem. Are you running Solr 5.x?

Markus

-----Original message-----

From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com> <mailto:edotservice@gmail.com
<mailto:edotservice@gmail.com>>>

Sent: Monday 18th January 2016 11:55

To: dev@nutch.apache.org <mailto:dev@nutch.apache.org> <mailto:dev@nutch.apache.org
<mailto:dev@nutch.apache.org>>

Subject: Re: Nutch/Solr communication problem

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http however
options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings to be used
(madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawlDbyah/crawldb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawlDbyah/linkdb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment:
crawlDbyah/segments/20160117191906

2016-01-17 19:19:42,975 WARN  indexer.IndexerMapReduce - Ignoring linkDb for indexing, no
linkDb found in path: crawlDbyah/linkdb

2016-01-17 19:19:43,807 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,809 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:43,963 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,980 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:44,260 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2016-01-17 19:19:45,128 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter

2016-01-17 19:19:45,148 INFO  solr.SolrUtils - Authenticating as: radmin

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: content dest: content

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: title dest: title

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: host dest: host

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: segment dest: segment

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: boost dest: boost

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: digest dest: digest

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp

2016-01-17 19:19:45,360 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,507 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,526 WARN  mapred.LocalJobRunner - job_local2114349538_0001

java.lang.Exception: java.io.IOException

        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)

Caused by: java.io.IOException

        at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171)

        at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157)

        at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)

        at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)

        at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502)

        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking
to server at: http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah
<http://127.0.0.1:8983/solr/yah>> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>
<http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>>>

        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)

        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)

        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)

        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)

        at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)

        ... 11 more

Caused by: org.apache.http.client.ClientProtocolException

        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)

        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)

        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)

        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)

        ... 15 more

Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with
a non-repeatable request entity.

        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208)

        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)

        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)

        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)

        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)

        ... 19 more

2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>
<mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>> <mailto:markus.jelsma@openindex.io
<mailto:markus.jelsma@openindex.io> <mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>>>
wrote:

Hi - can you post the log output?

Markus

-----Original message-----

From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com> <mailto:edotservice@gmail.com
<mailto:edotservice@gmail.com>> <mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>
<mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>>>>

Sent: Monday 18th January 2016 2:06

To: dev@nutch.apache.org <mailto:dev@nutch.apache.org> <mailto:dev@nutch.apache.org
<mailto:dev@nutch.apache.org>> <mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org>
<mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org>>>

Subject: Nutch/Solr communication problem

Hi everyone,

I have situation here, I am using nutch 1.11 and solr 5.4

Solr is protected by user name and password  I am passing credential to solr using following
command

bin/crawl -i -Dsolr.server.url=http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc
<http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>>>  -D
solr.auth=true  -Dsolr.auth.username=xxxx  -Dsolr.auth.password=xxx  url crawlDbyah 1

and always same problem , please help me how to feed data to protected solr.

Below is error message.

Indexer: starting at 2016-01-17 19:01:12

Indexer: deleting gone documents: false

Indexer: URL filtering: false

Indexer: URL normalizing: false

Active IndexWriters :

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http however
options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings to be used
(madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

Indexing 2 documents

Indexing 2 documents

Indexer: java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

I also tried username and password in nutch-default.xml but again same error. Please help
me out.



Mime
View raw message