lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Mueller <supidupi...@googlemail.com>
Subject Re: Very very large scale Solr Deployment = how to do (Expert Question)?
Date Thu, 07 Apr 2011 05:29:40 GMT
Hello Ephraim, hello Lance, hello Walter,

thanks for your replies:

Ephraim, thanks very much for the further detailed explanation. I will try
to setup a demo system in the next few days and use your advice.
LoadBalancers are an important aspect of your design. Can you recommend one
LB specificallly? (I would be using haproxy.1wt.eu) . I think the Idea with
uploading your document is very good. However Google-Docs seemed not be be
working (at least for me with the docx format?), but maybe you can simply
output the document as PDF and then I think Google Docs is working, so all
the others can also have a look at your concept. The best approach would be
if you could upload your advice directly somewhere to the solr wiki as it is
really helpful.I found some other documents meanwhile, but yours is much
clearer and more complete, with the LBs and the Aggregators (
http://lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf)

Lance, thanks I will have a look at what linkedin is doing.

Walter, thanks for the advice: Well you are right, mentioning google. My
question was also to understand how such large systems like google/facebook
are actually working. So my numbers are just theoretical and made up. My
system will be smaller,  but I would be very happy to understand how such
large systems are build and I think the approach Ephraim showd should be
working quite well at large scale. If you know a good documents (besides the
bigtable research paper that I already know) that technically describes how
google is working in detail that would be of great interest. You seem to be
working for a company that handles large datasets. Does google use this
approach, sharing the index into N writers, and the procuded index is then
replicated to N "read only searchers"?

thank you all.
best regards
jens



2011/4/7 Walter Underwood <wunder@wunderwood.org>

> The bigger answer is that you cannot get to this size by just configuring
> Solr. You may have to invent a lot of stuff. Like all of Google.
>
> Where did you get these numbers? The proposed query rate is twice as big as
> Google (Feb 2010 estimate, 34K qps).
>
> I work at MarkLogic, and we scale to 100's of terabytes, with fast update
> and query rates. If you want a real system that handles that, you might want
> to look at our product.
>
> wunder
>
> On Apr 6, 2011, at 8:06 PM, Lance Norskog wrote:
>
> > I would not use replication. LinkedIn consumer search is a flat system
> > where one process indexes new entries and does queries simultaneously.
> > It's a custom Lucene app called Zoie. Their stuff is on Github..
> >
> > I would get documents to indexers via a multicast IP-based queueing
> > system. This scales very well and there's a lot of hardware support.
> >
> > The problem with distributed search is that it is a) inherently slower
> > and b) has inherently more and longer jitter. The "airplane wing"
> > distribution of query times becomes longer and flatter.
> >
> > This is going to have to be a "federated" system, where the front-end
> > app aggregates results rather than Solr.
> >
> > On Mon, Apr 4, 2011 at 6:25 PM, Jens Mueller <supidupi007@googlemail.com>
> wrote:
> >> Hello Experts,
> >>
> >>
> >>
> >> I am a Solr newbie but read quite a lot of docs. I still do not
> understand
> >> what would be the best way to setup very large scale deployments:
> >>
> >>
> >>
> >> Goal (threoretical):
> >>
> >>  A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)
> >>
> >>  B) Queries: 100000 Queries/ per Second
> >>
> >>  C) Updates: 100000 Updates / per Second
> >>
> >>
> >>
> >>
> >> Solr offers:
> >>
> >> 1.)    Replication => Scales Well for B)  BUT  A) and C) are not
> satisfied
> >>
> >>
> >> 2.)    Sharding => Scales well for A) BUT B) and C) are not satisfied
> (=> As
> >> I understand the Sharding approach all goes through a central server,
> that
> >> dispatches the updates and assembles the quries retrieved from the
> different
> >> shards. But this central server has also some capacity limits...)
> >>
> >>
> >>
> >>
> >> What is the right approach to handle such large deployments? I would be
> >> thankfull for just a rough sketch of the concepts so I can
> experiment/search
> >> further…
> >>
> >>
> >> Maybe I am missing something very trivial as I think some of the “Solr
> >> Users/Use Cases” on the homepage are that kind of large deployments. How
> are
> >> they implemented?
> >>
> >>
> >>
> >> Thanky very much!!!
> >>
> >> Jens
> >>
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message