lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <>
Subject Re: Spread SolrCloud across two locations
Date Wed, 07 Jun 2017 11:11:50 GMT
Thanks for checking Shawn.

So rolling ZK restart is bad, and ZK nodes with different config is bad,
Guess this could still work if
* All ZK config changes are done by stopping ALL zk nodes
* All config changes are done controlled and manual so DC1 don’t come up by surprise with
old config

PS: I was not proposing an *automatic* triggering of a reconfiguration script, but rather
to have a script that someone runs manually in order to make sure one does not mess up the

Jan Høydahl, search solution architect
Cominvent AS -

> 2. jun. 2017 kl. 14.57 skrev Shawn Heisey <>:
> On 5/29/2017 8:57 AM, Jan Høydahl wrote:
>> And if you start all three in DC1, you have 3+3 voting, what would
>> then happen? Any chance of state corruption?
>> I believe that my solution isolates manual change to two ZK nodes in
>> DC2, while your requires config change to 1 in DC2 and manual
>> start/stop of 1 in DC1.
> I took the scenario to the zookeeper user list.  Here's the thread:
> I'm not completely clear on what they're saying, but here's what I think
> it means:  Dealing with a loss of dc1 by reconfiguring ZK servers in DC2
> might work, or it might crash and burn once connectivity to DC1 is restored.
>> Well, that’s not up to me to decide, it’s the customer environment
>> that sets the constraints, they currently have 2 independent geo
>> locations. And Solr is just a dependency of some other app they need
>> to install, so doubt that they are very happy to start adding racks or
>> independent power/network for this alone. Of course, if they already
>> have such redundancy within one of the DCs, placing a 3rd ZK there is
>> an ideal solution with probably good enough HA. If not, I’m looking
>> for the 2nd best low-friction approach with software-only.
> Even if all goes well with scripted reconfiguration of DC2, I don't
> think I'd want to try and automate it, because of the chance for a brief
> outage to trigger it.  Without automation, if the failure happened at
> just the wrong moment, it could be a while before anyone notices, and it
> might be hours after it gets noticed before relevant personnel are in a
> position to run the reconfiguration script on DC2, during which you'd
> have a read-only SolrCloud.
> Frequently search is such a critical part of of a web applications that
> if it doesn't work, there IS no web application.  That certainly
> describes the systems that use the Solr installations that I manage. 
> For that kind of application, damage to reputation caused by a couple of
> hours where the website doesn't get any updates might be MUCH more
> expensive than the monthly cost for a virtual private server from a
> hosting company.
> Thanks,
> Shawn

View raw message