lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Haddorp <hendrik.hadd...@gmx.net>
Subject Re: Large Number of Collections takes down Solr 7.3
Date Tue, 29 Jan 2019 09:39:17 GMT
How much memory do the Solr instances have? Any more details on what 
happens when the Solr instances start to fail?
We are using multiple Solr clouds to keep the collection count low(er).

On 29.01.2019 06:53, Gus Heck wrote:
> Does it all have to be in a single cloud?
>
> On Mon, Jan 28, 2019, 10:34 PM Shawn Heisey <apache@elyograg.org wrote:
>
>> On 1/28/2019 8:12 PM, Monica Skidmore wrote:
>>> I would have to negotiate with the middle-ware teams - but, we've used a
>> core per customer in master-slave mode for about 3 years now, with great
>> success.  Our pool of data is very large, so limiting a customer's searches
>> to just their core keeps query times fast (or at least reduces the chances
>> of one customer impacting another with expensive queries.  There is also a
>> little security added - since the customer is required to provide the core
>> to search, there is less chance that they'll see another customer's data in
>> their responses (like they might if they 'forgot' to add a filter to their
>> query.  We were hoping that moving to Cloud would help our management of
>> the largest customers - some of which we'd like to sub-shard with the cloud
>> tooling.  We expected cloud to support as many cores/collections as our
>> 2-versions-old Solr instances - but we didn't count on all the increased
>> network traffic or the extra complications of bringing up a large cloud
>> cluster.
>>
>> At this time, SolrCloud will not handle what you're trying to throw at
>> it.  Without Cloud, Solr can fairly easily handle thousands of indexes,
>> because there is no communication between nodes about cluster state.
>> The immensity of that communication (handled via ZooKeeper) is why
>> SolrCloud can't scale to thousands of shard replicas.
>>
>> The solution to this problem will be twofold:  1) Reduce the number of
>> work items in the Overseer queue.  2) Make the Overseer do its job a lot
>> faster.  There have been small incremental improvements towards these
>> goals, but as you've noticed, we're definitely not there yet.
>>
>> On the subject of a customer forgetting to add a filter ... your systems
>> should be handling that for them ... if the customer has direct access
>> to Solr, then all bets are off... they'll be able to do just about
>> anything they want.  It is possible to configure a proxy to limit what
>> somebody can get to, but it would be pretty complicated to come up with
>> a proxy configuration that fully locks things down.
>>
>> Using shards is completely possible without SolrCloud.  But SolrCloud
>> certainly does make it a lot easier.
>>
>> How many records in your largest customer indexes?  How big are those
>> indexes on disk?
>>
>> Thanks,
>> Shawn
>>


Mime
View raw message