lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santanu8939967892 <mishra.sant...@gmail.com>
Subject Re: DIH to index the data - 250 millions - Need a best architecture
Date Tue, 30 Jul 2013 11:59:47 GMT
Hi Shawn,
     Thanks for your detailed explanation.
Will do a POC and finalize the arch.

With Regards,
Santanu


On Tue, Jul 30, 2013 at 12:20 PM, Shawn Heisey <solr@elyograg.org> wrote:

> On 7/30/2013 12:23 AM, Santanu8939967892 wrote:
> >      Yes, your assumption is correct. The index size is around 250 GB and
> > we index 20/30 meta data and store around 50.
> >      We have plan for a Solr cloud architecture having two nodes one
> Master
> > and other one is replica of the master (replication factor 2) with
> multiple
> > zookeeper ensemble. We will have multiple shards for each Master and
> > replica node.
> > Is above architecture a fit for production deployment for an improved
> index
> > and query performance.
> > Do we require 64 GB RAM or less will work for us.
>
> It sounds like you're planning to put the entire index on one server,
> and then have a replica on another server.  You'll have multiple shards,
> but they won't be running on separate hardware.  Running multiple shards
> per server is a strategy that can work well if you have a lot CPU cores
> and a low query volume.  When the query volume gets really high, you
> will want fewer shards per server and more servers.
>
> If your index is on spinning disks, I wouldn't try to run an index of
> that size on a host with less than 128GB RAM, and I'd try to get 256GB.
>  If you have to choose between super-high-end CPUs and memory, choose
> memory ... but don't skimp TOO much on the CPUs.  The amount of RAM
> required for each server will go down if you spread the shards out
> across more servers.
>
> If the index is on SSD, 64GB might work OK, but 128GB would be better.
> If your query volume is low, 64GB might even work for spinning disks,
> but the query latency might be fairly high.
>
> If you require a very high query volume, two replicas might not be
> enough, and you wouldn't want to run a lot of shards per server.  You'd
> have to actually set up a proof of concept and run tests with real data
> and real queries to find out for sure what you need.
>
> In case it isn't clear by now - assuming you've got enough RAM for good
> disk caching, query volume will dictate how many actual servers you need.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message