lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <>
Subject Re: Programmatic restructuring of a Solr cloud
Date Thu, 05 May 2011 13:07:01 GMT

One approach if you're using Amazon is using BeanStalk

* Create one master with 12 cores, named "jan", "feb", "mar" etc
* Every month, you clear the current month index and switch indexing to it
  You will only have one master, because you're only indexing to one month at a time
* For each of the 12 months, setup an Amazon BeanStalk instance with a Solr replica pointing
to its master
  This way, Amazon will spin off replicas as needed
  NOTE: Your replica could still be located at /solr/select even if it replicates from /solr/may/replication
* You only query the replicas, and the client will control whether to query one or more shards

After this is setup, you have 0 config to worry about :)

Jan Høydahl, search solution architect
Cominvent AS -

On 5. mai 2011, at 14.03, Sergey Sazonov wrote:

> Dear Solr Experts,
> First of all, I would like to thank you for your patience when answering questions of
those who are less experienced.
> And now to the main topic: I would like to learn whether it is possible to restructure
a Solr cloud programmatically.
> Let me describe the system we are designing to make the requirements clear. The indexed
documents are certain log entries. We are planning to shard them by month, and only keep the
last 12 months in the index. We are going to replicate each shard across several servers.
> Now, the user is always required to search within a single month (= shard). Most importantly,
we expect an absolute majority of the requests to query the current month, with only a minor
load on the previous months. In order to utilise the cluster most efficiently, we would like
a majority of the servers to contain replicas of the current month data, and have only one
or two servers per older month. To this end, we are planning to have a set of slaves that
"migrate" from master to master, depending on which master holds the data for the current
month. When a new month starts, those slaves have to be reconfigured to hold the new shard
and to replicate from the new master (their old master now holding the data for the previous
> Since this operation has to be done every month, we are naturally considering automating
it. So my question is whether anyone has faced a similar problem before, and what is the best
way to solve it. We are not committed to any solution, or even architecture, so feel free
to propose different solutions. The only requirement is that a majority of the servers should
be able to serve requests to the current month at any given moment.
> Thank you in advance for your answers.
> Best regards,
> Sergey Sazonov.

View raw message