lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-11258) ChaosMonkeySafeLeaderWithPullReplicasTest fails a lot & reproducibly: The Monkey ran for over 45 seconds and no jetties were stopped - this is worth investigating!
Date Mon, 08 Jan 2018 20:02:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316916#comment-16316916
] 

Tomás Fernández Löbbe commented on SOLR-11258:
----------------------------------------------

This may be the same as SOLR-10995, which I started looking at, but apparently never fixed...
sorry about that. I'll take a look

> ChaosMonkeySafeLeaderWithPullReplicasTest fails a lot & reproducibly:  The Monkey
ran for over 45 seconds and no jetties were stopped - this is worth investigating!
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11258
>                 URL: https://issues.apache.org/jira/browse/SOLR-11258
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>
> Between June21 & Aug18, there have been 18 failures like this...
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=ChaosMonkeySafeLeaderWithPullReplicasTest
-Dtests.method=test -Dtests.seed=7669B63E9E4D1685 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=pa-Guru
-Dtests.timezone=Europe/Podgorica -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] FAILURE 82.4s | ChaosMonkeySafeLeaderWithPullReplicasTest.test <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: The Monkey ran for over 45
seconds and no jetties were stopped - this is worth investigating!
>    [junit4]    >        at __randomizedtesting.SeedInfo.seed([7669B63E9E4D1685:FE3D89E430B17B7D]:0)
>    [junit4]    >        at org.apache.solr.cloud.ChaosMonkey.stopTheMonkey(ChaosMonkey.java:587)
>    [junit4]    >        at org.apache.solr.cloud.ChaosMonkeySafeLeaderWithPullReplicasTest.test(ChaosMonkeySafeLeaderWithPullReplicasTest.java:174)
>    [junit4]    >        at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:993)
>    [junit4]    >        at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:968)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In my own testing, when these failures happen, the seeds reproduce - suggesting the problem
is logic flaw in the test that can can happen by chance.
> Perhaps the ChaosMonkey needs to be changed to get more aggressive about stopping nodes
bsaed on how long it's been since hte last time it stopped a node?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message