lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Very very large scale Solr Deployment = how to do (Expert Question)?
Date Thu, 07 Apr 2011 03:20:14 GMT
The bigger answer is that you cannot get to this size by just configuring Solr. You may have
to invent a lot of stuff. Like all of Google.

Where did you get these numbers? The proposed query rate is twice as big as Google (Feb 2010
estimate, 34K qps).

I work at MarkLogic, and we scale to 100's of terabytes, with fast update and query rates.
If you want a real system that handles that, you might want to look at our product.

wunder

On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote:

> I would not use replication. LinkedIn consumer search is a flat system
> where one process indexes new entries and does queries simultaneously.
> It's a custom Lucene app called Zoie. Their stuff is on Github..
> 
> I would get documents to indexers via a multicast IP-based queueing
> system. This scales very well and there's a lot of hardware support.
> 
> The problem with distributed search is that it is a) inherently slower
> and b) has inherently more and longer jitter. The "airplane wing"
> distribution of query times becomes longer and flatter.
> 
> This is going to have to be a "federated" system, where the front-end
> app aggregates results rather than Solr.
> 
> On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller <supidupi007@googlemail.com> wrote:
>> Hello Experts,
>> 
>> 
>> 
>> I am a Solr newbie but read quite a lot of docs. I still do not understand
>> what would be the best way to setup very large scale deployments:
>> 
>> 
>> 
>> Goal (threoretical):
>> 
>>  A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)
>> 
>>  B) Queries: 100000 Queries/ per Second
>> 
>>  C) Updates: 100000 Updates / per Second
>> 
>> 
>> 
>> 
>> Solr offers:
>> 
>> 1.)    Replication => Scales Well for B)  BUT  A) and C) are not satisfied
>> 
>> 
>> 2.)    Sharding => Scales well for A) BUT B) and C) are not satisfied (=> As
>> I understand the Sharding approach all goes through a central server, that
>> dispatches the updates and assembles the quries retrieved from the different
>> shards. But this central server has also some capacity limits...)
>> 
>> 
>> 
>> 
>> What is the right approach to handle such large deployments? I would be
>> thankfull for just a rough sketch of the concepts so I can experiment/search
>> further…
>> 
>> 
>> Maybe I am missing something very trivial as I think some of the “Solr
>> Users/Use Cases” on the homepage are that kind of large deployments. How are
>> they implemented?
>> 
>> 
>> 
>> Thanky very much!!!
>> 
>> Jens
>> 
> 





Mime
View raw message