lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bram Van Dam <bram.van...@intix.eu>
Subject Re: How large is your solr index?
Date Sun, 04 Jan 2015 13:44:39 GMT
On 01/04/2015 02:22 AM, Jack Krupansky wrote:
> The reality doesn't seem to
> be there today. 50 to 100 million documents, yes, but beyond that takes
> some kind of "heroic" effort, whether a much beefier box, very careful and
> limited data modeling or limiting of query capabilities or tolerance of
> higher latency, expert tuning, etc.

I disagree. On the scale, at least. Up until 500M Solr performs "well" 
(read: well enough considering the scale) in a single shard on a single 
box of commodity hardware. Without any tuning or heroic efforts. Sure, 
some queries aren't as snappy as you'd like, and sure, indexing and 
querying at the same time will be somewhat unpleasant, but it will work, 
and it will work well enough.

Will it work for thousands of concurrent users? Of course not. Anyone 
who is after that sort of thing won't find themselves in this scenario 
-- they will throw hardware at the problem.

There is something to be said for making sharding less painful. It would 
be nice if, for instance, Solr would automagically create a new shard 
once some magic number was reached (2B at the latest, I guess). But then 
that'll break some query features ... :-(

The reason we're using single large instances (sometimes on beefy 
hardware) is that SolrCloud is a pain. Not just from an administrative 
point of view (though that seems to be getting better, kudos for that!), 
but mostly because some queries cannot be executed with 
distributed=true. Our users, at least, prefer a slow query over an 
impossible query.

Actually, this 2B limit is a good thing. It'll help me convince 
$management to donate some of our time to Solr :-)

  - Bram

Mime
View raw message