nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "VictorHu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-2205) Nutch solrdedup error in solrcloud for larger docs
Date Mon, 25 Jan 2016 09:28:40 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

VictorHu updated NUTCH-2205:
----------------------------
    Affects Version/s: 2.3
          Environment: CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop 2.5.2,Zookeeper 3.4.6 ,Hbase
0.98.8 ,Solr 4.8.1 ,Nutch 2.3.1
        Fix Version/s: 2.4
          Description: 
When the number of solr docs larger than 9000,the solrdedup of the nutch is broken.This is
log: 


http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: starting...
16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr url: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
16/01/25 17:02:39 INFO client.RMProxy: Connecting to ResourceManager at master.Itble/10.192.1.100:8032
16/01/25 17:02:43 INFO mapreduce.JobSubmitter: number of splits:1
16/01/25 17:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453104806095_0162
16/01/25 17:02:44 INFO impl.YarnClientImpl: Submitted application application_1453104806095_0162
16/01/25 17:02:44 INFO mapreduce.Job: The url to track the job: http://master.Itble:8088/proxy/application_1453104806095_0162/
16/01/25 17:02:44 INFO mapreduce.Job: Running job: job_1453104806095_0162
16/01/25 17:02:54 INFO mapreduce.Job: Job job_1453104806095_0162 running in uber mode : false
16/01/25 17:02:54 INFO mapreduce.Job:  map 0% reduce 0%
16/01/25 17:03:02 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_0, Status
: FAILED
Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException:
No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1]
        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
        at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
        at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

16/01/25 17:03:12 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_1, Status
: FAILED
Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException:
No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1,
http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1]
        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
        at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
        at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

16/01/25 17:03:22 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_2, Status
: FAILED
Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException:
No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1]
        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
        at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
        at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

16/01/25 17:03:31 INFO mapreduce.Job:  map 100% reduce 100%
16/01/25 17:03:31 INFO mapreduce.Job: Job job_1453104806095_0162 failed with state FAILED
due to: Task failed task_1453104806095_0162_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/01/25 17:03:31 INFO mapreduce.Job: Counters: 8
        Job Counters 
                Failed map tasks=4
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=30150
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=30150
                Total vcore-seconds taken by all map tasks=30150
                Total megabyte-seconds taken by all map tasks=46310400
              Summary: Nutch solrdedup error in solrcloud for larger docs   (was: Nutch solrdedup
error in solrcloud for doc)

> Nutch solrdedup error in solrcloud for larger docs 
> ---------------------------------------------------
>
>                 Key: NUTCH-2205
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2205
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 2.3
>         Environment: CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop 2.5.2,Zookeeper 3.4.6
,Hbase 0.98.8 ,Solr 4.8.1 ,Nutch 2.3.1
>            Reporter: VictorHu
>             Fix For: 2.4
>
>
> When the number of solr docs larger than 9000,the solrdedup of the nutch is broken.This
is log: 
> http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
> 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: starting...
> 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr url: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
> 16/01/25 17:02:39 INFO client.RMProxy: Connecting to ResourceManager at master.Itble/10.192.1.100:8032
> 16/01/25 17:02:43 INFO mapreduce.JobSubmitter: number of splits:1
> 16/01/25 17:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453104806095_0162
> 16/01/25 17:02:44 INFO impl.YarnClientImpl: Submitted application application_1453104806095_0162
> 16/01/25 17:02:44 INFO mapreduce.Job: The url to track the job: http://master.Itble:8088/proxy/application_1453104806095_0162/
> 16/01/25 17:02:44 INFO mapreduce.Job: Running job: job_1453104806095_0162
> 16/01/25 17:02:54 INFO mapreduce.Job: Job job_1453104806095_0162 running in uber mode
: false
> 16/01/25 17:02:54 INFO mapreduce.Job:  map 0% reduce 0%
> 16/01/25 17:03:02 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_0,
Status : FAILED
> Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException:
No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1]
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
>         at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
>         at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
>         at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
>         at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 16/01/25 17:03:12 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_1,
Status : FAILED
> Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException:
No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1,
http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1]
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
>         at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
>         at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
>         at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
>         at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 16/01/25 17:03:22 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_2,
Status : FAILED
> Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException:
No live SolrServers available to handle this request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1]
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
>         at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
>         at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
>         at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
>         at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 16/01/25 17:03:31 INFO mapreduce.Job:  map 100% reduce 100%
> 16/01/25 17:03:31 INFO mapreduce.Job: Job job_1453104806095_0162 failed with state FAILED
due to: Task failed task_1453104806095_0162_m_000000
> Job failed as tasks failed. failedMaps:1 failedReduces:0
> 16/01/25 17:03:31 INFO mapreduce.Job: Counters: 8
>         Job Counters 
>                 Failed map tasks=4
>                 Launched map tasks=4
>                 Other local map tasks=4
>                 Total time spent by all maps in occupied slots (ms)=30150
>                 Total time spent by all reduces in occupied slots (ms)=0
>                 Total time spent by all map tasks (ms)=30150
>                 Total vcore-seconds taken by all map tasks=30150
>                 Total megabyte-seconds taken by all map tasks=46310400



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message