lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Wright <bri...@marketo.com>
Subject Solr replica recovery mode triggers
Date Thu, 13 Oct 2016 23:33:37 GMT
Hello,

I'm looking to see if there is any documentation describing situations 
which could trigger a shard replica to go into recovery. We're having 
issues with some of our replicas randomly going into recovery mode. For 
example, I know of certain conditions when this may happen such as if 
the node loses connectivity to Zookeeper (i.e., timeouts) and 
clusterconfig.json is updated to show the node as down or if using hard 
commits for every query (we're not using this) which can overtax the 
server and also cause timeouts.

We would like to get to the bottom of why these nodes are being 
requested to recover and by whom (i.e., Zookeeper / Overseer or the 
shard leader).

The bigger reason for this is that some of our datasets can take days to 
recover (this is due to the types of queries being issued and the amount 
of continuous ingress traffic, these queries are currently being 
addressed and optimized by our engineering team). Until then, I'd like 
to find a way to prevent these nodes from going into recovery mode in 
the first place.

Please let me know if there are any docs that describe recovery 
scenarios or troubleshooting or if any of you have experience with this 
situation. Any help is greatly appreciated.

Thanks.

-- 
Signature

*Brian Wright*
*Sr. Systems Engineer *
901 Mariners Island Blvd Suite 200
San Mateo, CA 94404 USA
*Email *brianw@marketo.com <mailto:brianw@marketo.com>
*Phone *+1.650.539.3530**
*****www.marketo.com <http://www.marketo.com/>*

	Marketo Logo



Mime
View raw message