lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Sturge <>
Subject Re: How large is your solr index?
Date Tue, 06 Jan 2015 18:33:27 GMT
Yes, totally agree. We run 500m+ docs in a (non-cloud) Solr4, and it even
performs reasonably well on commodity hardware with lots of faceting and
concurrent indexing! Ok, you need a lot of RAM to keep faceting happy, but
it works.

++1 for the automagic shard creator. We've been looking into doing this
sort of thing internally - i.e. when a shard reaches a certain size/num
docs, it creates 'sub-shards' to which new commits are sent and queries to
the 'parent' shard are included. The concept works, as long as you don't
try any non-dist stuff - it's one reason why all our fields are always
single valued. There are also other implications like cleanup, deletes and
security to take into account, to name a few.
A cool side-effect of sub-sharding (for lack of a snappy term) is that the
parent shard then stops suffering from auto-warming latency due to commits
(we do a fair amount of committing). In theory, you could carry on
sub-sharding until your hardware starts gasping for air.

On Sun, Jan 4, 2015 at 1:44 PM, Bram Van Dam <> wrote:

> On 01/04/2015 02:22 AM, Jack Krupansky wrote:
>> The reality doesn't seem to
>> be there today. 50 to 100 million documents, yes, but beyond that takes
>> some kind of "heroic" effort, whether a much beefier box, very careful and
>> limited data modeling or limiting of query capabilities or tolerance of
>> higher latency, expert tuning, etc.
> I disagree. On the scale, at least. Up until 500M Solr performs "well"
> (read: well enough considering the scale) in a single shard on a single box
> of commodity hardware. Without any tuning or heroic efforts. Sure, some
> queries aren't as snappy as you'd like, and sure, indexing and querying at
> the same time will be somewhat unpleasant, but it will work, and it will
> work well enough.
> Will it work for thousands of concurrent users? Of course not. Anyone who
> is after that sort of thing won't find themselves in this scenario -- they
> will throw hardware at the problem.
> There is something to be said for making sharding less painful. It would
> be nice if, for instance, Solr would automagically create a new shard once
> some magic number was reached (2B at the latest, I guess). But then that'll
> break some query features ... :-(
> The reason we're using single large instances (sometimes on beefy
> hardware) is that SolrCloud is a pain. Not just from an administrative
> point of view (though that seems to be getting better, kudos for that!),
> but mostly because some queries cannot be executed with distributed=true.
> Our users, at least, prefer a slow query over an impossible query.
> Actually, this 2B limit is a good thing. It'll help me convince
> $management to donate some of our time to Solr :-)
>  - Bram

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message