lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amrit Sarkar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
Date Fri, 01 Sep 2017 16:02:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150771#comment-16150771
] 

Amrit Sarkar commented on SOLR-11278:
-------------------------------------

Confirmed there are two seperate bootstrap threads initiated, one acquires lock, one fails
::
{code}
[beaster]   2> 34430 INFO  (updateExecutor-39-thread-1-processing-n:127.0.0.1:35690_solr
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:35690_solr
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler
what' the lock this time :: true
  [beaster]   2> 34431 INFO  (qtp510024884-173) [n:127.0.0.1:35690_solr c:cdcr-target s:shard1
r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.S.Request [cdcr-target_shard1_replica_n1]
 webapp=/solr path=/cdcr params={qt=/cdcr&masterUrl=http://127.0.0.1:38721/solr/cdcr-source_shard1_replica_n1/&action=BOOTSTRAP&wt=javabin&version=2}
status=0 QTime=10
  [beaster]   2> 34432 INFO  (qtp600983226-216) [n:127.0.0.1:38721_solr c:cdcr-source s:shard1
r:core_node2 x:cdcr-source_shard1_replica_n1] o.a.s.c.S.Request [cdcr-source_shard1_replica_n1]
 webapp=/solr path=/update params={_stateVer_=cdcr-source:4&wt=javabin&version=2}
status=0 QTime=13
  [beaster]   2> 34433 INFO  (updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:35690_solr
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler
what' the lock this time :: false
{code}

{{updateExecutor-39-thread-1-processing}}.....
{{updateExecutor-39-thread-2-processing}}.....

There are only three lines printed (two added for extensive logging) for {{updateExecutor-39-thread-2-processing}},
why? and why it is introduced at first place.

{code}
  [beaster]   2> 34433 INFO  (updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:35690_solr
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler
what' the lock this time :: false
  [beaster]   2> 34433 INFO  (updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:35690_solr
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler
we reached this point :: CANCEL BOOTSTRAP, locked :: false
  [beaster]   2> 34433 INFO  (updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:35690_solr
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler
someone called me, yes they did
{code}

My best guess :

See after {{updateExecutor-39-thread-1-processing}} gets the lock and invoke BOOTSTRAP api.
An update is received on {{source}} collection and right after that another *"updateExecutor-39-thread-2-processing"*
is invoked trying to acquire the lock and eventually when it fails, invoke CANCEL_BOOTSTRAP,
creating chaos.

*The fine time frame b/w INVOKING bootstrap, CHANGING bootstrap status to running and RECEIVING
a new update on source is creating confusion/chaos, as handleBootrapStatus is runnable thread,
so more than one can gets invoked.*

Got the cause, need the solution.

> CdcrBootstrapTest failing in branch_6_6
> ---------------------------------------
>
>                 Key: SOLR-11278
>                 URL: https://issues.apache.org/jira/browse/SOLR-11278
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>            Reporter: Amrit Sarkar
>            Assignee: Varun Thacker
>         Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=vi
-Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | CdcrBootstrapTest.testBootstrapWithSourceCluster
<<<
>   [beaster]    > Throwable #1: java.lang.AssertionError: Document mismatch on target
after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message