lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-10525) Stacked recovery requests can interfere with one another
Date Wed, 19 Apr 2017 21:39:41 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975573#comment-15975573
] 

Mark Miller edited comment on SOLR-10525 at 4/19/17 9:39 PM:
-------------------------------------------------------------

bq.  looks like the same issue of multiple requests stacking.

The reason for SOLR-8702 by the way is to examine no stacking at all. The issue that reduced
stacking was titled more like "reduce stacking" not eliminate it. To eliminate it, we would
want a patch and to examine if the change is worth any slow down in recovery calls we might
have. Right now we can get hammered by recovery calls and they should all be very, very fast
and result in few or no stack ups. Previously you stacked up every request.

In other words, if you eliminate stacking completely, is a recovery request going to cost
more than a tryLock and atomic integer increment. Cause in the a concurrent env, that is super
fast.


was (Author: markrmiller@gmail.com):
bq.  looks like the same issue of multiple requests stacking.

The reason for SOLR-8702 by the way is to examine no stacking at all. The issue that reduced
stacking was titled more like "reduce stacking" not eliminate it. To eliminate it, we would
want a patch and to examine if the change is worth any slow down in recovery calls we might
have. Right now we can get hammered by recovery calls and they should all be very, very fast
and result in few or no stack ups. Previously you stacked up every request.

> Stacked recovery requests can interfere with one another
> --------------------------------------------------------
>
>                 Key: SOLR-10525
>                 URL: https://issues.apache.org/jira/browse/SOLR-10525
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Mike Drob
>         Attachments: SOLR-10525.patch
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java#L300-L310
> Two issues with this code:
> {code}
>           boolean locked = recoveryLock.tryLock();
>           try {
>             if (!locked) {
>               if (recoveryWaiting.get() > 0) { // line 1
>                 return;
>               }
>               recoveryWaiting.incrementAndGet(); // line 2
>             } else {
>               recoveryWaiting.incrementAndGet();
>               cancelRecovery(); // line 3
> }
> {code}
> The {{cancelRecovery}} on line 3 call will only hit when there are no recoveries to actually
cancel (since we got the lock that means there are no recoveries in progress). Instead it
should be moved either to the either branch of the if, or outside after the if since we know
we will be running a recovery at that point.
> This code doesn't always prevent multiple requests from stacking. If there is a recovery
running, but no recoveries currently waiting, multiple requests can check the count at line
1 before any of them will increment the count at line 2 and thus all of them will hit the
increment.
> I don't have specific tests for this, but it's causing failures for me on my SOLR-9555
work in progress.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message