lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn Häuser <bjoernhaeu...@gmail.com>
Subject Error when trying to replace node with Solr 6.6.0
Date Thu, 03 Aug 2017 16:51:01 GMT
Hey Folks,

we today hit the same error three times, a REPLACENODE call was not successful.

Here is our scenario: 

3 Node Solrcloud cluster running in Kubernetes on top of AWS. 

Today we wanted to rotate the underlying storage (increased from 50gb to 300gb). 

After we rotated one node we tried to replace with this call:

	• curl 'solr-2.solr-discovery.default.svc.cluster.local:8983/solr/admin/collections?action=REPLACENODE&source=solr-2.solr-discovery.default.svc.cluster.local.:8983_solr&target=solr-2.solr-discovery.default.svc.cluster.local.:8983_solr&async=4495d85b-0aa4-45ab-8067-9d7d4da375d3'
	• curl 'solr-2.solr-discovery.default.svc.cluster.local:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=4495d85b-0aa4-45ab-8067-9d7d4da375d3’

The error we got was:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">28</int></lst><str
name="Operation replacenode caused exception:">java.util.concurrent.RejectedExecutionException:java.util.concurrent.RejectedExecutionException:
Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$15/509076276@5c9136c8
rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@1cce4506[Running,
pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 0]</str><lst
name="exception"><str name="msg">Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$15/509076276@5c9136c8
rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@1cce4506[Running,
pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 0]</str><int
name="rspCode">-1</int></lst><lst name="status"><str name="state">failed</str><str
name="msg">found [4495d85b-0aa4-45ab-8067-9d7d4da375d3] in failed tasks</str></lst>
</response>


The problem was that afterwards we had the same shard on the same node twice. One recovered
and we had to delete the other one manually. For some collections the REPLACENODE went through
and everything was fine again.

Can you advice what we did wrong here or which configuration we need to adapt?

Thanks
Björn
Mime
View raw message