lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bram Van Dam <bram.van...@intix.eu>
Subject Re: How large is your solr index?
Date Wed, 07 Jan 2015 09:25:01 GMT
On 01/06/2015 07:54 PM, Erick Erickson wrote:
> Have you considered pre-supposing SolrCloud and using the SPLITSHARD
> API command?

I think that's the direction we'll probably be going. Index size (at 
least for us) can be unpredictable in some cases. Some clients start out 
small and then grow exponentially, while others start big and then don't 
grow much at all. Starting with SolrCloud would at least give us that 
flexibility.

That being said, SPLITSHARD doesn't seem ideal. If a shard reaches a 
certain size, it would be better for us to simply add an extra shard, 
without splitting.


> On Tue, Jan 6, 2015 at 10:33 AM, Peter Sturge <peter.sturge@gmail.com> wrote:
>> ++1 for the automagic shard creator. We've been looking into doing this
>> sort of thing internally - i.e. when a shard reaches a certain size/num
>> docs, it creates 'sub-shards' to which new commits are sent and queries to
>> the 'parent' shard are included. The concept works, as long as you don't
>> try any non-dist stuff - it's one reason why all our fields are always
>> single valued.

Is there a problem with multi-valued fields and distributed queries?

>> A cool side-effect of sub-sharding (for lack of a snappy term) is that the
>> parent shard then stops suffering from auto-warming latency due to commits
>> (we do a fair amount of committing). In theory, you could carry on
>> sub-sharding until your hardware starts gasping for air.

Sounds like you're doing something similar to us. In some cases we have 
a hard commit every minute. Keeping the caches hot seems like a very 
good reason to send data to a specific shard. At least I'm assuming that 
when you add documents to a single shard and commit; the other shards 
won't be impacted...

  - Bram


Mime
View raw message