lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Mishra <solrmis...@gmail.com>
Subject Inconsistent recovery status of replicas
Date Mon, 07 Dec 2020 09:55:24 GMT
Hello guys
I am using Solr cloud 7.7 on Kubernetes. During the adding of replica
sometimes we see inconsistency after successful addition nodes go to
recovery status sometimes it takes 2-3 minute to recover while sometimes it
takes more than an hour. We are getting this error.
We have 4 shards each shard has around 7GB of data. After seeing the system
metrics we see bandwidth exchanges are high between the leader and the new
replica node. Do we have any way to rate-limit the bandwidth exchange like
we had some configuration for it in master-slave? maxMbpersec something
like that?

Error

> 2020-12-01 13:40:34.983 ERROR (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr
x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec s:shard2 r:core_node3956)
[c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 x:olxid-20200531_d6e431ec_shard2_replica_p3955]
o.a.s.c.RecoveryStrategy Error while trying to recover:org.apache.solr.client.solrj.SolrServerException:
Timeout occured while waiting response from server at: http://solr-olxid-statefulset-tlog-7.solr-olxid-statefulset-headless.relevance:8983/solr/olxid-20200531_d6e431ec_shard2_replica_t139
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
> 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
> 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> 	at org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:287)
> 	at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:215)
> 	at org.apache.solr.cloud.RecoveryStrategy.doReplicateOnlyRecovery(RecoveryStrategy.java:382)
> 	at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:328)
> 	at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307)
> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.SocketTimeoutException: Read timed out
> 	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
> 	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
> 	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
> 	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
> 	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
> 	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
> 	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
> 	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
> 	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
> 	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
> 	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
> 	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
> 	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
> 	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> 	at org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
> 	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
> 	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
> 	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
> 	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> 	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
> 	... 16 more2020-12-01 13:40:34.983 ERROR (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr
x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec s:shard2 r:core_node3956)
[c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 x:olxid-20200531_d6e431ec_shard2_replica_p3955]
o.a.s.c.RecoveryStrategy Recovery failed - trying again... (1)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message