lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Infrastructure for large Lucene index
Date Sat, 07 Oct 2006 02:31:44 GMT

: 3. If you're worried about high availability, then one fairly simple
: approach is to have two parallel set of search clusters, with a load
: balancer in front. For each cluster, monitor both the front-end
: server (where the results get combined) and each of the back-end
: search servers - for example, something like Big Brother or Ganglia.
: Then if one of the search servers (or, god forbid, the front end
: server) goes down, you can automatically remove that cluster from the
: load balancer's active set.

the availability of this approach doesn't scale very cleanly though ... if
any one box in either cluster goes down, the entire cluster becomes
unusable.  Doubling the size of your collection would only double the
number of boxes you need -- but the reliability would be cut in half,
meaning you'd really need to quadruple the number of boxes (doubling the
number of clusters) to maintain the same level of reliability ... if i'[m
not mistaken the number of boxes would need to grow quadraticly as your
index size grows linearly.

A system where every individual node in the cluster is load balanced
across 2 physical boxes would require the same amount of hardware to start
with, but would require a lot less hardware to grow.


View raw message