lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: DIH to index the data - 250 millions - Need a best architecture
Date Tue, 30 Jul 2013 06:50:10 GMT
On 7/30/2013 12:23 AM, Santanu8939967892 wrote:
>      Yes, your assumption is correct. The index size is around 250 GB and
> we index 20/30 meta data and store around 50.
>      We have plan for a Solr cloud architecture having two nodes one Master
> and other one is replica of the master (replication factor 2) with multiple
> zookeeper ensemble. We will have multiple shards for each Master and
> replica node.
> Is above architecture a fit for production deployment for an improved index
> and query performance.
> Do we require 64 GB RAM or less will work for us.

It sounds like you're planning to put the entire index on one server,
and then have a replica on another server.  You'll have multiple shards,
but they won't be running on separate hardware.  Running multiple shards
per server is a strategy that can work well if you have a lot CPU cores
and a low query volume.  When the query volume gets really high, you
will want fewer shards per server and more servers.

If your index is on spinning disks, I wouldn't try to run an index of
that size on a host with less than 128GB RAM, and I'd try to get 256GB.
 If you have to choose between super-high-end CPUs and memory, choose
memory ... but don't skimp TOO much on the CPUs.  The amount of RAM
required for each server will go down if you spread the shards out
across more servers.

If the index is on SSD, 64GB might work OK, but 128GB would be better.
If your query volume is low, 64GB might even work for spinning disks,
but the query latency might be fairly high.

If you require a very high query volume, two replicas might not be
enough, and you wouldn't want to run a lot of shards per server.  You'd
have to actually set up a proof of concept and run tests with real data
and real queries to find out for sure what you need.

In case it isn't clear by now - assuming you've got enough RAM for good
disk caching, query volume will dictate how many actual servers you need.


View raw message