lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-11258) ChaosMonkeySafeLeaderWithPullReplicasTest fails a lot & reproducibly: The Monkey ran for over 45 seconds and no jetties were stopped - this is worth investigating!
Date Fri, 18 Aug 2017 19:22:00 GMT
Hoss Man created SOLR-11258:
-------------------------------

             Summary: ChaosMonkeySafeLeaderWithPullReplicasTest fails a lot & reproducibly:
 The Monkey ran for over 45 seconds and no jetties were stopped - this is worth investigating!
                 Key: SOLR-11258
                 URL: https://issues.apache.org/jira/browse/SOLR-11258
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man


Between June21 & Aug18, there have been 18 failures like this...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=ChaosMonkeySafeLeaderWithPullReplicasTest
-Dtests.method=test -Dtests.seed=7669B63E9E4D1685 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=pa-Guru
-Dtests.timezone=Europe/Podgorica -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 82.4s | ChaosMonkeySafeLeaderWithPullReplicasTest.test <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: The Monkey ran for over 45 seconds
and no jetties were stopped - this is worth investigating!
   [junit4]    >        at __randomizedtesting.SeedInfo.seed([7669B63E9E4D1685:FE3D89E430B17B7D]:0)
   [junit4]    >        at org.apache.solr.cloud.ChaosMonkey.stopTheMonkey(ChaosMonkey.java:587)
   [junit4]    >        at org.apache.solr.cloud.ChaosMonkeySafeLeaderWithPullReplicasTest.test(ChaosMonkeySafeLeaderWithPullReplicasTest.java:174)
   [junit4]    >        at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:993)
   [junit4]    >        at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:968)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
{noformat}

In my own testing, when these failures happen, the seeds reproduce - suggesting the problem
is logic flaw in the test that can can happen by chance.

Perhaps the ChaosMonkey needs to be changed to get more aggressive about stopping nodes bsaed
on how long it's been since hte last time it stopped a node?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message