lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: How large is your solr index?
Date Sun, 11 Jan 2015 20:07:04 GMT
bq: I still feel like shard management could be made easier. I'll see
if I can have a look at JIRA and try to pitch in.

_nobody_ will disagree here! It's been a time and interest thing so far...

Automating it all is tempting, but.... my experience so far indicates
that once people get to large scales they want to be able to take some
explicit control of where replicas land and what nodes collections are
hosted on, often having heterogenous collections hosted on the same
hardware, making it very difficult to automate entirely.

How's your Angular JS? See SOLR-5507, there's work being done to
rewrite the Admin UI in Angular JS. Since I totally avoid UI work I
don't have much insight into the pros/cons, so I trust the people
putting fingers to keyboards. I'm sure Upayavira would welcome any
help there.

What would be waaay cool and a step in the path to full automation
wold be the ability to access the collections API through the UI. The
whole collections API is really not known by the current admin UI,
which has its roots in the single-core, no-shard world....

I should add that this is a place to _start_ IMO, "low hanging fruit"
and all that. But once the collections API is accessible, I can
imagine all sorts of stuff layered on top. But just providing a
cloud-friendly way to use the collections API calls would be a great
place to start.

I like to define a short-term goal, just to get things started, so
here's one (feel free to ignore of course!).

I want a way to see all the nodes in my system that have running
Solrs, whether or not they host any existing Solr replicas. I want to
click on any number of them and create a new collection hosted on
those nodes. Really, this is just "click on a bunch of nodes that are
then the nodeSet for the Collections CREATE action", along with a
dialog box for all the CREATE options.


On Sun, Jan 11, 2015 at 8:23 AM, Bram Van Dam <> wrote:
>> Do note that one strategy is to create more shards than you need at
>> the beginning. Say you determine that 10 shards will work fine, but
>> you expect to grow your corpus by 2x. _Start_  with 20 shards
>> (multiple shards can be hosted in the same JVM, no problem, see
>> maxShardsPerNode in the collections API CREATE action. Then
>> as your corpus grows you can move the shards to their own
>> boxes.
> I guess planning ahead is something we can do. We usually have a pretty good
> idea of how large our indexes are going to be (number of documents is one of
> the things we base our license pricing on). I still feel like shard
> management could be made easier. I'll see if I can have a look at JIRA and
> try to pitch in.
> Thanks a lot for the input, especially Shawn & Erick!

View raw message