nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Nutch/Solr communication problem
Date Mon, 18 Jan 2016 20:37:12 GMT
Hi - then that is the problem. If there was an authentication issue, a 401-like exception should
be visible, not the stack trace you posted. Nutch does not yet support Solr 5.x but a colleague
uploaded a patch recently for Nutch 1.11 with Solr 5.x support. We use it in production although
without basic HTTP authentication, but it should work as it is based on the older indexer
plugins.

See: https://issues.apache.org/jira/browse/NUTCH-2197

Markus

-----Original message-----
From: Zara Parst<edotservice@gmail.com>
Sent: Monday 18th January 2016 21:28
To: dev@nutch.apache.org
Subject: Re: Nutch/Solr communication problem

I am using solr 5.4 and nutch 1.11

On Tue, Jan 19, 2016 at 1:46 AM, Markus Jelsma <markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>
wrote:
Hi - it was an answer to your question whether i have ever used it. Yes, i patched and committed
it. And therefore i asked if youre using Solr 5 or not. So again, are you using Solr 5?

Markus

-----Original message-----

From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com>>

Sent: Monday 18th January 2016 16:16

To: dev@nutch.apache.org <mailto:dev@nutch.apache.org>

Subject: Re: Nutch/Solr communication problem

Mind to share that patch ?

On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma <markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>
<mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>> wrote:

Yes i have used it, i made the damn patch myself years ago, and i used the same configuration.
Command line or config work the same.

Markus

-----Original message-----

From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com> <mailto:edotservice@gmail.com
<mailto:edotservice@gmail.com>>>

Sent: Monday 18th January 2016 12:55

To: dev@nutch.apache.org <mailto:dev@nutch.apache.org> <mailto:dev@nutch.apache.org
<mailto:dev@nutch.apache.org>>

Subject: Re: Nutch/Solr communication problem

Dear Markus,

Are you just speaking blindly or what ?? My concern is did you ever try pushing index to solr
which is password protected ? If yes can you just tell me what were the config you used ,
if you did that in config file then let me know or if you did through command then please
let me know.

thanks

On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma <markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>
<mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>> <mailto:markus.jelsma@openindex.io
<mailto:markus.jelsma@openindex.io> <mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>>>
wrote:

Hi - This doesnt look like a HTTP basic authentication problem. Are you running Solr 5.x?

Markus

-----Original message-----

From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com> <mailto:edotservice@gmail.com
<mailto:edotservice@gmail.com>> <mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>
<mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>>>>

Sent: Monday 18th January 2016 11:55

To: dev@nutch.apache.org <mailto:dev@nutch.apache.org> <mailto:dev@nutch.apache.org
<mailto:dev@nutch.apache.org>> <mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org>
<mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org>>>

Subject: Re: Nutch/Solr communication problem

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http however
options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings to be used
(madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawlDbyah/crawldb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawlDbyah/linkdb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment:
crawlDbyah/segments/20160117191906

2016-01-17 19:19:42,975 WARN  indexer.IndexerMapReduce - Ignoring linkDb for indexing, no
linkDb found in path: crawlDbyah/linkdb

2016-01-17 19:19:43,807 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,809 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:43,963 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,980 WARN  conf.Configuration - file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:44,260 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off

2016-01-17 19:19:45,128 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter

2016-01-17 19:19:45,148 INFO  solr.SolrUtils - Authenticating as: radmin

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: content dest: content

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: title dest: title

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: host dest: host

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: segment dest: segment

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: boost dest: boost

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: digest dest: digest

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp

2016-01-17 19:19:45,360 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,507 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,526 WARN  mapred.LocalJobRunner - job_local2114349538_0001

java.lang.Exception: java.io.IOException

        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)

Caused by: java.io.IOException

        at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171)

        at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157)

        at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)

        at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)

        at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502)

        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking
to server at: http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah
<http://127.0.0.1:8983/solr/yah>> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>
<http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>>> <http://127.0.0.1:8983/solr/yah
<http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>>
<http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah
<http://127.0.0.1:8983/solr/yah>>>>

        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)

        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)

        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)

        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)

        at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)

        ... 11 more

Caused by: org.apache.http.client.ClientProtocolException

        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)

        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)

        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)

        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)

        ... 15 more

Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with
a non-repeatable request entity.

        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208)

        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)

        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)

        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)

        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)

        ... 19 more

2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>
<mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>> <mailto:markus.jelsma@openindex.io
<mailto:markus.jelsma@openindex.io> <mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>>
<mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io> <mailto:markus.jelsma@openindex.io
<mailto:markus.jelsma@openindex.io>> <mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>
<mailto:markus.jelsma@openindex.io <mailto:markus.jelsma@openindex.io>>>>>
wrote:

Hi - can you post the log output?

Markus

-----Original message-----

From: Zara Parst<edotservice@gmail.com <mailto:edotservice@gmail.com> <mailto:edotservice@gmail.com
<mailto:edotservice@gmail.com>> <mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>
<mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>>> <mailto:edotservice@gmail.com
<mailto:edotservice@gmail.com> <mailto:edotservice@gmail.com <mailto:edotservice@gmail.com>>
<mailto:edotservice@gmail.com <mailto:edotservice@gmail.com> <mailto:edotservice@gmail.com
<mailto:edotservice@gmail.com>>>>>

Sent: Monday 18th January 2016 2:06

To: dev@nutch.apache.org <mailto:dev@nutch.apache.org> <mailto:dev@nutch.apache.org
<mailto:dev@nutch.apache.org>> <mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org>
<mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org>>> <mailto:dev@nutch.apache.org
<mailto:dev@nutch.apache.org> <mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org>>
<mailto:dev@nutch.apache.org <mailto:dev@nutch.apache.org> <mailto:dev@nutch.apache.org
<mailto:dev@nutch.apache.org>>>>

Subject: Nutch/Solr communication problem

Hi everyone,

I have situation here, I am using nutch 1.11 and solr 5.4

Solr is protected by user name and password  I am passing credential to solr using following
command

bin/crawl -i -Dsolr.server.url=http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc
<http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>>> <http://localhost:8983/solr/abc
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc
<http://localhost:8983/solr/abc>>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>
<http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc
<http://local
 host:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>>>>  -D
solr.auth=true  -Dsolr.auth.username=xxxx  -Dsolr.auth.password=xxx  url crawlDbyah 1

and always same problem , please help me how to feed data to protected solr.

Below is error message.

Indexer: starting at 2016-01-17 19:01:12

Indexer: deleting gone documents: false

Indexer: URL filtering: false

Indexer: URL normalizing: false

Active IndexWriters :

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http however
options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings to be used
(madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

Indexing 2 documents

Indexing 2 documents

Indexer: java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

I also tried username and password in nutch-default.xml but again same error. Please help
me out.



Mime
View raw message