lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robbie Douglas <rld...@cornell.edu>
Subject Could not publish that recovery failed
Date Fri, 03 Apr 2020 15:52:18 GMT
Hello,

We had an outage on one of our Solr nodes that we are trying to figure out.
Here's what came up in the Solr admin logs. 3 separate ones that I think
were in this order, but maybe not.

Stopping recovery for core=[b1_shard5_replica_n16]
coreNodeName=[core_node19]
 
Error while trying to recover.
core=b1_shard5_replica_n16:org.apache.solr.common.SolrException: Error while
saving shard term for collection: b1
         at
org.apache.solr.cloud.ZkShardTerms.saveTerms(ZkShardTerms.java:307)
         at
org.apache.solr.cloud.ZkShardTerms.forceSaveTerms(ZkShardTerms.java:281)
         at
org.apache.solr.cloud.ZkShardTerms.startRecovering(ZkShardTerms.java:227)
         at
org.apache.solr.cloud.ZkController.publish(ZkController.java:1576)
         at
org.apache.solr.cloud.ZkController.publish(ZkController.java:1500)
         at
org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:577)
         at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:326)
         at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307)
         at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
         at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)
 
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/b1/terms/shard5
         at
org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
         at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1336)
         at
org.apache.solr.common.cloud.SolrZkClient.lambda$setData$6(SolrZkClient.java:370)
         at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)
         at
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:370)
         at
org.apache.solr.cloud.ZkShardTerms.saveTerms(ZkShardTerms.java:297)
         ... 14 more
 
Could not publish that recovery
failed:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue
         at
org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
         at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1111)
         at
org.apache.solr.common.cloud.SolrZkClient.lambda$exists$2(SolrZkClient.java:322)
         at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)
         at
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:322)
         at
org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:309)
         at
org.apache.solr.cloud.ZkController.publish(ZkController.java:1587)
         at
org.apache.solr.cloud.ZkController.publish(ZkController.java:1500)
         at
org.apache.solr.cloud.RecoveryStrategy.recoveryFailed(RecoveryStrategy.java:190)
         at
org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:715)
         at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:326)
         at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307)
         at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
         at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)


Solr is 8.1.1 with Zookeeper 3.4.9 deployed on the same nodes.

Solr config looks like this.

-DSTOP.KEY=solrrocks
-DSTOP.PORT=7983
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=18983
-Dcom.sun.management.jmxremote.rmi.port=18983
-Dcom.sun.management.jmxremote.ssl=false
-Djetty.home=/cul/app/solr/solr/server
-Djetty.port=8983
-Dlog4j.configurationFile=file:/cul/data/solr/log4j2.xml
-Dsolr.data.home=
-Dsolr.default.confdir=/cul/app/solr/solr/server/solr/configsets/_default/conf
-Dsolr.install.dir=/cul/app/solr/solr
-Dsolr.jetty.https.port=8983
-Dsolr.log.dir=/cul/data/solr/logs
-Dsolr.log.muteconsole
-Dsolr.solr.home=/cul/data/solr/data
-Duser.timezone=UTC
-DzkClientTimeout=15000
-DzkHost=zk-host1:2181, zk-host2:2181, zk-host3:2181
-XX:+AlwaysPreTouch
-XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseG1GC
-XX:+UseGCLogFileRotation
-XX:+UseLargePages
-XX:GCLogFileSize=20M
-XX:MaxGCPauseMillis=250
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/cul/app/solr/solr/bin/oom_solr.sh 8983
/cul/data/solr/logs
-Xloggc:/cul/data/solr/logs/solr_gc.log
-Xms8g
-Xmx8g
-Xss256k
-verbose:gc


Any ideas on what to keep an eye on that would cause this would be greatly
appreciated.

Thanks,
Robbie

 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message