lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: solr sizing
Date Mon, 29 Jul 2013 21:38:50 GMT
On 7/29/2013 2:18 PM, Torsten Albrecht wrote:
> we have
>
> - 70 mio documents to 100 mio documents
>
> and we want
>
> - 800 requests per second
>
>
> How many servers Amazon EC2/real hardware we Need for this?
>
> Solr 4.x with solr cloud or better shards with loadbalancer?
>
> Is anyone here who can give me some information, or who operates a similar system itself?

Your question is impossible to answer, aside from generalities that 
won't really help all that much.

I have a similarly sized system (82 million docs), but I don't have 
query volume anywhere near what yours is.  I've got less than 10 queries 
per second.  I have two copies of my index.  I use a load balancer with 
traditional sharding.

I don't do replication, my two index copies are completely independent. 
  I set it up this way long before SolrCloud was released.  Having two 
completely independent indexes lets me do a lot of experimentation that 
a typical SolrCloud setup won't let me do.

One copy of the index is running 3.5.0 and is about 142GB if you add up 
all the shards.  The other copy of the index is running 4.2.1 and is 
about 87GB on disk.  Each copy of the index runs on two servers, six 
large cold shards and one small hot shard.  Each of those servers has 
two quad-core processors (Xeon E5400 series, so fairly old now) and 64GB 
of RAM.  I can get away with multiple shards per host because my query 
volume is so low.

Here's a screenshot of a status servlet that I wrote for my index. 
There's tons of info here about my index stats:

https://dl.dropboxusercontent.com/u/97770508/statuspagescreenshot.png

If I needed to start over from scratch with your higher query volume, I 
would probably set up two independent SolrCloud installs, each with a 
replicationFactor of at least two, and I'd use 4-8 shards.  I would put 
a load balancer in front of it so that I could bring one cloud down and 
have everything still work, though with lower performance.  Because of 
the query volume, I'd only have one shard per host.  Depending on how 
big the index ended up being, I'd want 16-32GB (or possibly more) RAM 
per host.

You might not need the flexibility of two independent clouds, and it 
would require additional complexity in your indexing software.  If you 
only went with one cloud, you'd just need a higher replicationFactor.

I'd also want to have another set of servers (not as beefy) to have 
another independent SolrCloud with a replicationFactor of 1 or 2 for dev 
purposes.

That's a LOT of hardware, and it would NOT be cheap.  Can I be sure that 
you'd really need that much hardware?  Not really.  To to be quite 
honest, you'll just have to set up a proof-of-concept system and be 
prepared to make it bigger.

Thanks,
Shawn


Mime
View raw message