lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Role of zookeeper at runtime
Date Fri, 01 Mar 2013 00:06:22 GMT
On 2/28/2013 4:20 PM, varun srivastava wrote:
> We have 10 virtual data centres . Now its setup like this because we do
> rolling update. While 1 st dc is getting indexed other 9 serve traffic .
> Indexing one dc take 2 hours. Now with single shard we use to index one dc
> and then quickly replicate index into other dcs by having master-slave
> setup. Now in case of solr cloud obviously we can't index each dc
> sequentially as it will take 2*10 hours. So we need way of indexing 1 dc
> and then somehow quickly propagate the index binary to others. What will
> you recommend for solr cloud ?

This is my understanding of how SolrCloud works.  If I am wrong about 
any of this, I'm sure one of the experts will correct me.  I'm still 
learning SolrCloud, so this is an opportunity for me to find out if I 
understand it right:

SolrCloud is not master-slave.  One replica of each shard is designated 
leader.  I think you can influence which one becomes leader, but I don't 
know how to do this.

When you index, the receiving node forwards the request to the leader of 
the correct shard.  The leader then processes the update request locally 
and sends it to all replicas of that shard, so they all index the same 
data independently.

If a node goes down, the remaining replicas handle requests and continue 
to process any updates that come in.  When the down node comes back up, 
the leader will see if it can use its transaction log to sync up the 
recovered node.  If it can, it will do so.  If it can't, it tells the 
recovered node to replicate its index, so you must have the replication 
handler enabled on all SolrCloud nodes, even though it does not use 
traditional master/slave roles.

If the leader goes down, the remaining replicas elect a new leader.

If you want to continue using master/slave semantics, I don't think you 
can use SolrCloud.  SolrCloud will result in a lot of inter-DC traffic 
at all times, which you probably want to avoid.


View raw message