lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SolrCloud multi-datacenter failover?
Date Sat, 03 Jan 2015 04:28:46 GMT
bq: This is problematic because some portion of user activity will fail,
queries that are in transit will not complete

This is always interesting to think about, but is it a serious enough
problem to spend resources trying to anticipate? I can imagine situations
where even losing the queries in transit once per year is unacceptable,
but those are outliers; is yours _that_ critical?

I mean if you have data centers failing often enough that it impacts
users noticeably, you have waaaaay bigger problems than losing the results
for the current queries routed to that data center.

When was the last time one of your data centers went completely off line
anyway? I guess my point is that anticipating this kind of thing would be
way down on my priority list, personally I'd ignore it.

That said, you know your situation best and maybe it's worth the effort,
but if I were the project manager I'd push back at the requirements people
pretty hard before spending engineering effort to try to anticipate such a
thing, engineering effort I'd be taking away from addressing the problems
that impact users all the time.

FWIW,
Erick


On Fri, Jan 2, 2015 at 1:52 PM, jaime spicciati
<jaime.spicciati@gmail.com> wrote:
> All,
>
> At my current customer we have developed a custom federator that will
> federate queries between Endeca and Solr to ease the transition from an
> extremely large (TBs of data) Endeca index to Solr. (Endeca is similar to
> Solr in terms of search/faceted navigation/etc).
>
>
>
> During this transition plan we need to support multi datacenter failover
> which we have historically handled via load balancers with the appropriate
> failover configurations (think F5). We are currently playing our dataloads
> into multiple datacenters to ensure data consistency. (Each datacenter has
> a stand-alone instance of solrcloud with its own redundancy/failover)
>
>
>
> I am curious to see how the community handles multi datacenter failureover
> at the presentation layer (datacenter A goes down and we want to failover
> to B). Solrcloud within a datacenter will handle single datacenter failure
> within the instance, but in order to support multi datacenter failover I
> haven't seen a definitive ‘answer’ as to how to handle this situation.
>
>
>
> At this point the only two options I can come up with are
>
> 1) Fail the entire datacenter if Solrcloud goes offline (GUI/index/etc go
> offline)
>
>  - This is problematic because some portion of user activity will fail,
> queries that are in transit will not complete
>
> 2) Implement failover at the custom federator level. In doing so we would
> need to detect a failure at datacenter A within our federator, then query
> datacenter B to fulfill the user request, then potentially fail the entire
> datacenter A once all transactions have been fulfilled against A
>
>
>
> Since we are looking up the active solr instance via zookeeper (solrcloud)
> per datacenter I don’t see any reasonable means of failing over to another
> datacenter if a given solrcloud instance goes down?
>
>
> Any thoughts are welcome at this point?
>
> Thanks
>
> Jaime

Mime
View raw message