lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11469) LeaderElectionContextKeyTest has flawed logic: 50% of the time it checks the wrong shard's elections
Date Tue, 24 Oct 2017 16:32:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217195#comment-16217195
] 

Hoss Man commented on SOLR-11469:
---------------------------------

bq. We will have same replica names for each collection. ( This satisfy current logic of Assign.buildCoreNodeName
)

Hmmm... i'm -0 having the test make these assumptions.  Fundementally this is the same problem
the test had before: it makes assumptions about the lower level implementation of how/when
coreNodeMakes will be assigned that the client code used in the test doesn't have any control
over -- and doesn't directly validate.  if/when the implementation changes the test _may_
start failing in unpredictable ways.

Can we please at least add some sanity check assertions to this test that will look at the
clusterstate and fail with a clear error message if the audo-assigned coreNodeNames don't
match the expectations of the test?

> LeaderElectionContextKeyTest has flawed logic: 50% of the time it checks the wrong shard's
elections
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11469
>                 URL: https://issues.apache.org/jira/browse/SOLR-11469
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Cao Manh Dat
>             Fix For: 7.2, master (8.0)
>
>         Attachments: SOLR-11469.patch, SOLR-11469.patch, SOLR-11469_incomplete_and_broken.patch
>
>
> LeaderElectionContextKeyTest is very flaky -- and on millers beastit reports it shows
a suspiciously close to "50%" failure rate.
> Digging into the test i realized that it creates a 2 shard index, then picks "a leader"
to kill (arbitrarily) and then asserts that the leader election nodes for *shard1* are affected
... so ~50% of the time it kills the shard2 leader and then fails because it doesn't see an
election in shard1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message