lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <>
Subject Re: LotsOfCores feature
Date Fri, 07 Jun 2013 00:32:12 GMT
I'm glad Erick finally answered my question (I think I actually asked it on 
the original Jira) concerning the rough magnitude of "Lots" - it's 
hundreds/thousands, but not hundreds of thousands, millions, or tens of 

So, if an app needs "millions", I think that suggests a "MegaCores" 
capability distinct from "LotsOfCores".

A use case would a web site or service that had millions of users, each of 
whom would have an active Solr core when they are active, but inactive 
otherwise. Of course those cores would not all reside on one node and 
ZooKeeper is out of the question for managing anything that is in the 
millions. This would be a true "cloud" or "data center" and even multi-data 
center app, not a "cluster" app.

So, I imagine that the app's "cloud" would have ZooKeeper-like servers whose 
job is to know all the available servers in the cloud and what Solr cores 
are running on them and how much spare capacity they have. If a request 
comes in to "find" a user's Solr, the CloudKeeper would consult its database 
(probably a Solr core with "millions" of rows"!) for the current location 
and status of the user's core. If the core is active, great, its location is 
returned. If not active, CK would check to see if the node on which it 
resides has sufficient spare compute capacity. If so, the user's Solr core 
would be spun up. If not, CK would find a machine with plenty of spare 
capacity, send a request to that node to pull the inactive core from the 
busy machine to the new node (or from a backup store of long idle Solr 
cores). Once the new node has the user's Solr core up, the node notifies CK 
of its status and CK updates its database. Meanwhile, the original client 
request would have returned with an "in progress" status and the client 
would periodically ping CK to see if progress had completed.

And then there would probably be an idle timeout that would cause a Solr 
core to spin down and notify CK that it is inactive.

Or something like that.

This would be a lot more of a true "Solr Cloud" than the "cluster" support 
that we have today.

And the "CloudKeeper" itself might be a "traditional" SolrCloud cluster, 
except that it needs to be multi-data center.

-- Jack Krupansky

-----Original Message----- 
From: Aleksey
Sent: Thursday, June 06, 2013 8:06 PM
To: solr-user
Subject: Re: LotsOfCores feature

I would not try putting tens of millions of cores on one machine. My
question (and I think Jack's as well) was around having them across a
fleet, say if I need 1M then I'd get 100 machines appropriately sized
for 10K each. I was clarifying because there was some talk about
ZooKeeper only being able to store small amount of configuration and
there were concerns that it won't keep information about which core is
where if it's millions.

This question is still open in my mind, since I haven't yet
familiarized myself with how ZK works.

On Thu, Jun 6, 2013 at 3:23 PM, Erick Erickson <> 
> Now Jack. You know "it depends" <G>.... Just answer
> the questions "how many simultaneous cores can you
> open on your hardware", and "what's the maximum percentage
> of the cores you expect to be open at any one time".
> Do some math and you have your answer.....
> The meta-data, essentially anything in the <core> tag
> or the file is kept in an in-memory structure. At
> startup time, that structure has to be filled. I haven't measured
> exactly, but it's relatively small (GUESS: 256 bytes) plus control
> structures. So _theoretically_ you could put millions on a single
> node. But you don't want to because:
> 1> if you're doing core discovery, you have to walk millions of
>      directories every time you start up.
> 2> otherwise you're maintaining a huge solr.xml file (which will be
>     going away anyway).
> Aleksey's use case also calls for "less than a million" or so open
> at once. I can't imagine fitting that many cores into memory
> simultaneously one one machine.
> The design goal is 10-15K cores on a machine. The theory
> is that pretty soon you're going to have a big enough percentage
> of them open that you'll blow memory up.
> And this is always governed by the size of the transient cache.
> Pretty soon you'll be opening a core for each and every query if
> you have more requests coming in for unique cores than your
> cache size.
> So, as usual, it's a matter of the usage pattern to determine how
> many cores you can put on the machine.
> Erick
> On Thu, Jun 6, 2013 at 4:13 PM, Jack Krupansky <> 
> wrote:
>> So, is that a clear yes or a clear no for Aleksey's use case - 10's of
>> millions of cores, not all active but each loadable on demand?
>> I asked this same basic question months ago and there was no answer
>> forthcoming.
>> -- Jack Krupansky
>> -----Original Message----- From: Erick Erickson
>> Sent: Thursday, June 06, 2013 3:53 PM
>> To:
>> Subject: Re: LotsOfCores feature
>> 100K is really not the limit, it's just hard to imagine
>> 100K cores on a single machine unless some were
>> really rarely used. And it's per node, not cluster-wide.
>> The current state is that everything is in place, including
>> transient cores, auto-discovery, etc. So you should be
>> able to go ahead and try it out.
>> The next bit that will help with efficiency is sharing named
>> config sets. The intent here is that <solrhome>/configs will
>> contain sub-dirs like "conf1", "conf2" etc. Then your cores
>> can reference configName=conf1 and only one copy of
>> the configuration data will be used rather than re-loading one
>> for each core as it comes up and down.
>> Do note that the _first_ query in to one of the not-yet-loaded
>> cores will be slow. The model here is that you can tolerate
>> some queries taking more time at first than you might like
>> in exchange for the hardware savings. This pre-supposes that
>> you simply cannot fit all the cores into memory at once.
>> The "won't fix" bits are there because, as we got farther into this
>> process, the approach changed and the functionality of the
>> won't fix JIRAs was subsumed by other changes by and large.
>> I've got to update that documentation sometime, but just haven't
>> had time yet. If you go down this route, we'll be happy to
>> add your name to the authorized editors of the wiki list if you'd
>> like.
>> Best
>> Erick
>> On Thu, Jun 6, 2013 at 3:08 PM, Aleksey <> wrote:
>>> I was looking at this wiki and linked issues:
>>> they talk about a limit being 100K cores. Is that per server or per
>>> entire fleet because zookeeper needs to manage that?
>>> I was considering a use case where I have tens of millions of indices
>>> but less that a million needs to be active at any time, so they need
>>> to be loaded on demand and evicted when not used for a while.
>>> Also since number one requirement is efficient loading of course I
>>> assume I will store a prebuilt index somewhere so Solr will just
>>> download it and strap it in, right?
>>> The root issue is marked as "won;t fix" but some other important
>>> subissues are marked as resolved. What's the overall status of the
>>> effort?
>>> Thank you in advance,
>>> Aleksey

View raw message