lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: solrCloud client socketTimeout initiates retries
Date Fri, 18 Dec 2020 14:17:53 GMT
Right, there are several alternatives. Try going here:
http://jirasearch.mikemccandless.com/search.py?index=jira

and search for “circuit breaker” and you’ll find a bunch
of JIRAs. Unfortunately, some are in 8.8..

That said, some of the circuit breakers are in much earlier
releases. Would it suffice until you can upgrade to set
the circuit breakers?

One problem with your solution is that the query keeps
on running, admittedly on only one replica of each shard.
With circuit breakers, the query itself is stoped, thus freeing
up resources.

Additionally, if you see a pattern (for instance, certain
wildcard patterns) you could intercept that before sending.

Best,
Erick

> On Dec 18, 2020, at 8:52 AM, kshitij tyagi <kshitij.solr@gmail.com> wrote:
> 
> Hi Erick,
> 
> I agree but in a huge cluster the retries keeps on happening, cant we have
> this feature implemented in client.
> i was referring to this jira
> https://issues.apache.org/jira/browse/SOLR-10479
> We have seen that some malicious queries come to system which takes
> significant time and these queries propagating to other solr servers choke
> the entire cluster.
> 
> Regards,
> kshitij
> 
> 
> 
> 
> 
> On Fri, Dec 18, 2020 at 7:12 PM Erick Erickson <erickerickson@gmail.com>
> wrote:
> 
>> Why do you want to do this? This sounds like an XY problem, you
>> think you’re going to solve some problem X by doing Y. Y in this case
>> is setting the numServersToTry, but you haven’t explained what X,
>> the problem you’re trying to solve is.
>> 
>> Offhand, this seems like a terrible idea. If you’re requests are timing
>> out, what purpose is served by _not_ trying the next one on the
>> list? With, of course, a much longer timeout interval…
>> 
>> The code is structured that way on the theory that you want the request
>> to succeed and the system needs to be tolerant of momentary
>> glitches due to network congestion, reading indexes into memory, etc.
>> Bypassing that assumption needs some justification….
>> 
>> Best,
>> Erick
>> 
>>> On Dec 18, 2020, at 6:23 AM, kshitij tyagi <kshitij.solr@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> We have a Solrcloud setup and are using CloudSolrClient, What we are
>> seeing
>>> is if socketTimeoutOccurs then the same request is sent to other solr
>>> server.
>>> 
>>> So if I set socketTimeout to a very low value say 100ms and my query
>> takes
>>> around 200ms then client tries to query second server, then next and so
>>> on(basically all available servers with same query).
>>> 
>>> I see that we have *numServersToTry* in LBSolrClient class but not able
>> to
>>> set this using CloudSolrClient. Using this we can restrict the above
>>> feature.
>>> 
>>> Should a jira be created to support numServersToTry by CloudSolrClient?
>> Or
>>> is there any other way to control the request to other solr servers?.
>>> 
>>> Regards,
>>> kshitij
>> 
>> 


Mime
View raw message