nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2269) Clean not working after crawl
Date Tue, 16 Aug 2016 17:29:21 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423108#comment-15423108
] 

Lewis John McGibbney commented on NUTCH-2269:
---------------------------------------------

[~wastl-nagel] said

bq. Are you able to reproduce the problem with the correct Solr version?

It looks like we are able to reproduce this against Solr 5.4.1. This is using Nutch 1.12.
I am going to try against master branch and see if this is still the case.

> Clean not working after crawl
> -----------------------------
>
>                 Key: NUTCH-2269
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2269
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.12
>         Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>            Reporter: Francesco Capponi
>             Fix For: 1.13
>
>
> I'm have been having this problem for a while and I had to rollback using the old solr
clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to clean, it
returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut down
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
> 	at org.apache.http.util.Asserts.check(Asserts.java:34)
> 	at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
> 	at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
> 	at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
> 	at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
> 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
> 	at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
> 	at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
> 	at org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
> 	at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
> 	at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
> 	at org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
> 	at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-05-30 10:13:09,299 ERROR indexer.CleaningJob - CleaningJob: java.io.IOException:
Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> 	at org.apache.nutch.indexer.CleaningJob.delete(CleaningJob.java:172)
> 	at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:195)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:206)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message