ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hurley <jhur...@hortonworks.com>
Subject Re: Server unit tests take too long (30+ minutes)
Date Wed, 25 Mar 2015 18:28:01 GMT
Builds are passing again after fixing 10197:

https://builds.apache.org/job/Ambari-trunk-Commit/2111

> On Mar 24, 2015, at 11:45 PM, Jayush Luniya <jluniya@hortonworks.com> wrote:
> 
> Done. 
> 
> https://issues.apache.org/jira/browse/AMBARI-10197
> 
> Thanks
> Jayush
> 
> 
> On 3/24/15, 7:50 PM, "Jonathan Hurley" <jhurley@hortonworks.com> wrote:
> 
>> Ah, I see that. Looks like TestController.TestController is a common
>> theme here then. I tried running the tests on CentOS 6 instead of OSX and
>> it looks like mine hung on test_certSigningFailed the first time and
>> test_heartbeat_no_host_check_cmd_in_queue the second time.
>> 
>> Let’s open up a Jira for this so it can be tracked and resolved.
>> 
>>> On Mar 24, 2015, at 7:20 PM, Jayush Luniya <jluniya@hortonworks.com>
>>> wrote:
>>> 
>>> Hi Jonathan,
>>> Yes, as I mentioned the UT tests hang which is not 100% repro. The BOA
>>> is
>>> aborted after 2 hours.
>>> 
>>> However the builds always hang during Ambari Agent Test. If you see the
>>> logs further up, you will see that the actual abort happened during the
>>> TestController UTs (I.e. Python was terminated), but the build was not
>>> yet
>>> entirely terminated and hence we continue building the ambari client,
>>> python client until it was completely aborted.
>>> 
>>> test_addToStatusQueue (TestController.TestController) ... ok
>>> test_certSigningFailed (TestController.TestController) ... ok
>>> test_heartbeatWithServer (TestController.TestController) ... ok
>>> test_registerAndHeartbeat (TestController.TestController) ... ok
>>> test_registerAndHeartbeatWithException (TestController.TestController)
>>> ...
>>> ok
>>> test_registerAndHeartbeat_check_registration_listener
>>> (TestController.TestController) ... Build timed out (after 120 minutes).
>>> Marking the build as aborted.
>>> Build was aborted
>>> 
>>> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/..
>>> /a
>>> mbari-common/src/main/unix/ambari-python-wrap: line 40: 31955 Terminated
>>>          $PYTHON "$@"
>>> [INFO]          
>>> 
>>> [INFO] 
>>> ------------------------------------------------------------------------
>>> [INFO] Building Ambari Client 2.0.0-SNAPSHOT
>>> [INFO] 
>>> ------------------------------------------------------------------------
>>> [INFO] 
>>> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ambari-client
>>> ---
>>> [INFO] Deleting 
>>> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client
>>> (includes = [**/*.pyc], excludes = [])
>>> [INFO] 
>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>> (parse-package-version) @ ambari-client ---
>>> [INFO] 
>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>> (parse-package-release) @ ambari-client ---
>>> [INFO] 
>>> [INFO] --- apache-rat-plugin:0.11:check (default) @ ambari-client ---
>>> [INFO] 53 implicit excludes (use -debug for more details).
>>> [INFO] No excludes explicitly specified.
>>> [INFO] 2 resources included (use -debug for more details)
>>> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated:
>>> 0
>>> approved: 2 licence.
>>> [INFO] 
>>> [INFO] --- maven-assembly-plugin:2.2-beta-5:single (build-tarball) @
>>> ambari-client ---
>>> [INFO] Reading assembly descriptor: assemblies/client.xml
>>> [INFO] 
>>> [INFO] --- maven-assembly-plugin:2.2-beta-5:single (make-assembly) @
>>> ambari-client ---
>>> [INFO] Reading assembly descriptor: assemblies/client.xml
>>> [INFO] 
>>> [INFO] --- maven-install-plugin:2.4:install (default-install) @
>>> ambari-client ---
>>> [INFO] Installing
>>> 
>>> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/p
>>> om
>>> .xml to 
>>> 
>>> /home/jenkins/.m2/repository/org/apache/ambari/ambari-client/2.0.0-SNAPSH
>>> OT
>>> /ambari-client-2.0.0-SNAPSHOT.pom
>>> [INFO]          
>>> 
>>> [INFO] 
>>> ------------------------------------------------------------------------
>>> [INFO] Building Ambari Python Client 2.0.0-SNAPSHOT
>>> [INFO] 
>>> ------------------------------------------------------------------------
>>> [INFO] 
>>> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ python-client
>>> ---
>>> [INFO] Deleting 
>>> 
>>> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/p
>>> yt
>>> hon-client (includes = [**/*.pyc], excludes = [])
>>> [INFO] 
>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>> (parse-package-version) @ python-client ---
>>> [INFO] 
>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>> (parse-package-release) @ python-client ---
>>> [INFO] 
>>> [INFO] --- exec-maven-plugin:1.2:exec (python-test) @ python-client ---
>>> Updating AMBARI-10163
>>> Recording test results
>>> Warning: you have no plugins providing access control for builds, so
>>> falling back to legacy behavior of permitting any downstream builds to
>>> be
>>> triggered
>>> Finished: ABORTED
>>> 
>>> Thanks
>>> Jayush
>>> 
>>> On 3/24/15, 1:25 PM, "Jonathan Hurley" <jhurley@hortonworks.com> wrote:
>>> 
>>>> I think that we¹re looking in the wrong places. Consider:
>>>> 
>>>> https://builds.apache.org/job/Ambari-trunk-Commit/2101
>>>> and
>>>> https://builds.apache.org/job/Ambari-trunk-Commit/2100
>>>> 
>>>> 2101 successfully built in about an hour. 2100 did not; it aborted
>>>> after
>>>> 2 hours. It aborted during the Groovy unit tests. Ambari unit test time
>>>> variances should not swing the total job time by an hour.
>>>> 
>>>> Perhaps something else is going gone here. Maybe there¹s a network
>>>> issue
>>>> and Git or one of the maven build steps is taking too long.
>>>> 
>>>> The pattern seems to be that the builds are not stuck since they are
>>>> aborted at different stages in between jobs. Groovy, agent tests, etc.
>>>> 
>>>> 
>>>> On Mar 24, 2015, at 4:07 PM, Jonathan Hurley
>>>> <jhurley@hortonworks.com<mailto:jhurley@hortonworks.com>> wrote:
>>>> 
>>>> No, that change should have no effect on the tests. There were aborted
>>>> runs before that change, and there were failed runs after it. It seems
>>>> like in some cases, the tests just take too long.
>>>> 
>>>> On Mar 24, 2015, at 3:55 PM, Jayush Luniya
>>>> <jluniya@hortonworks.com<mailto:jluniya@hortonworks.com>> wrote:
>>>> 
>>>> This is the change that went in in build#2072.
>>>> 
>>>> Jonathan, any change the issue below could have been caused by it?
>>>> Sumit, what was the commit version of your change to reenable
>>>> TestController tests and when was it committed?
>>>> 
>>>> 
>>>> 1. AMBARI-10126 <https://issues.apache.org/jira/browse/AMBARI-10126>
-
>>>> Alert Scheduler Is Double Scheduling Jobs (jonathanhurley) (details
>>>> 
>>>> <https://builds.apache.org/job/Ambari-trunk-Commit/2072/changes#detail0>
>>>> )
>>>> 
>>>> Commit 68468feeeeb35ca9edd4899ea8b1abafb7c2742a
>>>> 
>>>> <http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=68468fee
>>>> ee
>>>> b
>>>> 35ca9edd4899ea8b1abafb7c2742a> by jhurley
>>>> <https://builds.apache.org/user/jhurley/>AMBARI-10126
>>>> <https://issues.apache.org/jira/browse/AMBARI-10126> - Alert Scheduler
>>>> Is
>>>> Double Scheduling Jobs (jonathanhurley)
>>>> 
>>>> ambari-agent/src/main/python/ambari_agent/Controller.py
>>>> 
>>>> <http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blob&f=ambari-age
>>>> nt
>>>> /
>>>> 
>>>> src/main/python/ambari_agent/Controller.py&h=bb85337bfdf2404a6aabf78eb36
>>>> 1c
>>>> 1
>>>> 12f77c977e&hb=68468feeeeb35ca9edd4899ea8b1abafb7c2742a> (diff)
>>>> 
>>>> <http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blobdiff&f=ambari
>>>> -a
>>>> g
>>>> 
>>>> ent/src/main/python/ambari_agent/Controller.py&fp=ambari-agent/src/main/
>>>> py
>>>> t
>>>> 
>>>> hon/ambari_agent/Controller.py&h=eeca4c294399e04dae8d893f078d6e6125f3df4
>>>> 7&
>>>> h
>>>> 
>>>> p=bb85337bfdf2404a6aabf78eb361c112f77c977e&hb=68468feeeeb35ca9edd4899ea8
>>>> b1
>>>> a
>>>> bafb7c2742a&hpb=32e1215639f3cdfea68e2955f316576f1ded85fe>
>>>> 
>>>> 
>>>> Thanks
>>>> Jayush
>>>> 
>>>> On 3/24/15, 12:49 PM, "Sumit Mohanty"
>>>> <smohanty@hortonworks.com<mailto:smohanty@hortonworks.com>> wrote:
>>>> 
>>>> The TestController are the tests I re-enabled to run on mac recently.
>>>> So
>>>> we may see these failures locally as well if your dev box is mac.
>>>> ________________________________________
>>>> From: Jayush Luniya
>>>> <jluniya@hortonworks.com<mailto:jluniya@hortonworks.com>>
>>>> Sent: Tuesday, March 24, 2015 12:24 PM
>>>> To: Alejandro Fernandez;
>>>> dev@ambari.apache.org<mailto:dev@ambari.apache.org>
>>>> Subject: Re: Server unit tests take too long (30+ minutes)
>>>> 
>>>> Agreed we should take a look at reducing our test times.
>>>> 
>>>> Also, I looked at the latest builds on trunk, looks like there agent
>>>> tests are hanging as well leading to builds being aborted. Culprit
>>>> seems
>>>> to be TestController tests. This is not a consistent failure but
>>>> happens
>>>> very frequently since build#2072
>>>> https://builds.apache.org/job/Ambari-trunk-Commit/
>>>> 
>>>> 
>>>> test_repeatRegistration (TestController.TestController) ... ok
>>>> test_restartAgent (TestController.TestController) ... ok
>>>> test_run (TestController.TestController) ... Build timed out (after 120
>>>> minutes). Marking the build as aborted.
>>>> Build was aborted
>>>> 
>>>> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/.
>>>> ./
>>>> ambari-common/src/main/unix/ambari-python-wrap: line 40: 20024
>>>> Terminated
>>>>        $PYTHON "$@"
>>>> 
>>>> Thanks
>>>> Jayush
>>>> 
>>>> From: Alejandro Fernandez
>>>> <afernandez@hortonworks.com<mailto:afernandez@hortonworks.com>>
>>>> Date: Tuesday, March 24, 2015 at 12:18 PM
>>>> To: "dev@ambari.apache.org<mailto:dev@ambari.apache.org>"
>>>> <dev@ambari.apache.org<mailto:dev@ambari.apache.org>>
>>>> Cc: Jayush Luniya
>>>> <jluniya@hortonworks.com<mailto:jluniya@hortonworks.com>>
>>>> Subject: Re: Server unit tests take too long (30+ minutes)
>>>> 
>>>> +1 to that.
>>>> 
>>>> grep -B1 ".*sec$" ~/test_times.txt | sed 's/^.*Time elapsed:
>>>> \(.*\)$/\1/'
>>>> 
>>>> Here's another run with all tests that took over 30 secs. Total time in
>>>> these 28 test classes was 28 mins.
>>>> The biggest culprit was AmbariManagementControllerTest at 5:28
>>>> 
>>>> Running org.apache.ambari.server.agent.TestHeartbeatHandler
>>>> 89.435 sec
>>>> 
>>>> Running org.apache.ambari.server.upgrade.UpgradeTest
>>>> 76.566 sec
>>>> 
>>>> Running
>>>> 
>>>> org.apache.ambari.server.security.authorization.AmbariLdapAuthentication
>>>> Pr
>>>> oviderForDNWithSpaceTest
>>>> 55.582 sec
>>>> 
>>>> Running org.apache.ambari.server.security.authorization.TestUsers
>>>> 43.228 sec
>>>> 
>>>> Running
>>>> 
>>>> org.apache.ambari.server.security.authorization.AmbariLdapAuthentication
>>>> Pr
>>>> oviderTest
>>>> 57.922 sec
>>>> 
>>>> Running
>>>> 
>>>> org.apache.ambari.server.controller.internal.StackDefinedPropertyProvide
>>>> rT
>>>> est
>>>> 56.585 sec
>>>> 
>>>> Running
>>>> 
>>>> org.apache.ambari.server.controller.internal.RepositoryVersionResourcePr
>>>> ov
>>>> iderTest
>>>> 60.788 sec
>>>> 
>>>> Running
>>>> 
>>>> org.apache.ambari.server.controller.internal.UpgradeResourceProviderTest
>>>> 40.329 sec
>>>> 
>>>> Running
>>>> 
>>>> org.apache.ambari.server.controller.internal.HostStackVersionResourcePro
>>>> vi
>>>> derTest
>>>> 34.812 sec
>>>> 
>>>> Running
>>>> org.apache.ambari.server.controller.internal.StageResourceProviderTest
>>>> 37.434 sec
>>>> 
>>>> Running org.apache.ambari.server.controller.AmbariServerTest
>>>> 37.638 sec
>>>> 
>>>> Running 
>>>> org.apache.ambari.server.controller.AmbariManagementControllerTest
>>>> 317.327 sec
>>>> 
>>>> Running org.apache.ambari.server.actionmanager.TestActionDBAccessorImpl
>>>> 53.404 sec
>>>> 
>>>> Running org.apache.ambari.server.scheduler.ExecutionScheduleManagerTest
>>>> 34.245 sec
>>>> 
>>>> Running
>>>> org.apache.ambari.server.notifications.dispatchers.SNMPDispatcherTest
>>>> 34.732 sec
>>>> 
>>>> Running org.apache.ambari.server.state.UpgradeHelperTest
>>>> 35.616 sec
>>>> 
>>>> Running org.apache.ambari.server.state.alerts.AlertEventPublisherTest
>>>> 62.627 sec
>>>> 
>>>> Running org.apache.ambari.server.state.alerts.AlertDefinitionHashTest
>>>> 42.206 sec
>>>> 
>>>> Running 
>>>> org.apache.ambari.server.state.alerts.AlertStateChangedEventTest
>>>> 41.462 sec
>>>> 
>>>> Running org.apache.ambari.server.state.stack.UpgradePackTest
>>>> 72.379 sec
>>>> 
>>>> Running org.apache.ambari.server.state.ConfigHelperTest
>>>> 72.849 sec
>>>> 
>>>> Running
>>>> org.apache.ambari.server.state.svccomphost.ServiceComponentHostTest
>>>> 50.383 sec
>>>> 
>>>> Running org.apache.ambari.server.state.cluster.ClusterTest
>>>> 69.889 sec
>>>> 
>>>> Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
>>>> 80.271 sec
>>>> 
>>>> Running org.apache.ambari.server.state.ServiceTest
>>>> 45.443 sec
>>>> 
>>>> Running org.apache.ambari.server.orm.dao.AlertsDAOTest
>>>> 57.077 sec
>>>> 
>>>> Running org.apache.ambari.server.orm.dao.AlertDefinitionDAOTest
>>>> 33.872 sec
>>>> 
>>>> Running org.apache.ambari.server.metadata.RoleCommandOrderTest
>>>> 31.794 sec
>>>> 
>>>> Thanks,
>>>> Alejandro
>>>> 
>>>> On 3/24/15, 11:54 AM, "Jonathan Hurley"
>>>> <jhurley@hortonworks.com<mailto:jhurley@hortonworks.com>> wrote:
>>>> 
>>>> Many of these, such as the deadlock tests and alert tests are just
>>>> going
>>>> to take a long time due to the nature of what they're doing. In
>>>> general,
>>>> if b.a.o is timing out, we need to either increase the timeout for the
>>>> job or change our pom.xml to allow for forked execution of the tests.
>>>> 
>>>> In my local environment, 3 concurrent forks can run through the test
>>>> suite in about 20 minutes. The problem is that both LDAP tests below
>>>> always fail in a forked environment. I'd say if we want to get the
>>>> build
>>>> times down, we should look into making the 2 LDAP tests work with
>>>> forked
>>>> test runners in the pom.xml
>>>> 
>>>> On Mar 24, 2015, at 2:33 PM, Sumit Mohanty
>>>> <smohanty@hortonworks.com<mailto:smohanty@hortonworks.com>> wrote:
>>>> ?Hi,
>>>> these are some of the unit tests that take too long (more than 30
>>>> seconds
>>>> on my machine).  There are several that are above 10 seconds but below
>>>> 30
>>>> seconds range that can also use some optimization.
>>>> Jayush tells me that the Apache builds may be getting aborted as the
>>>> build + UT run takes more than an hour.
>>>> I will look into some of it when I get a chance. If there are any that
>>>> piques your curiosity then take a look.
>>>> Running org.apache.ambari.server.agent.TestHeartbeatHandler
>>>> Tests run: 34, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.43
>>>> sec
>>>> Running org.apache.ambari.server.state.cluster.ClusterTest
>>>> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 55.576
>>>> sec
>>>> Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
>>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.252
>>>> sec
>>>> Running org.apache.ambari.server.upgrade.UpgradeTest
>>>> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.433
>>>> sec
>>>> Running org.apache.ambari.server.orm.dao.AlertDispatchDAOTest
>>>> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.681
>>>> sec
>>>> Running org.apache.ambari.server.orm.dao.AlertsDAOTest
>>>> Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.474
>>>> sec
>>>> Running org.apache.ambari.server.security.authorization.TestUsers
>>>> Tests run: 26, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 36.421
>>>> sec
>>>> Running
>>>> 
>>>> org.apache.ambari.server.security.authorization.AmbariLdapAuthentication
>>>> Pr
>>>> oviderTest
>>>> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.46
>>>> sec
>>>> Running
>>>> 
>>>> org.apache.ambari.server.security.authorization.AmbariLdapAuthentication
>>>> Pr
>>>> oviderForDNWithSpaceTest
>>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.706
>>>> sec
>>>> Running org.apache.ambari.server.state.ConfigHelperTest
>>>> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.863
>>>> sec
>>>> Running
>>>> 
>>>> org.apache.ambari.server.controller.internal.StackDefinedPropertyProvide
>>>> rT
>>>> est
>>>> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.247
>>>> sec
>>>> ...
>>>> thanks
>>>> ?-Sumit
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 

Mime
View raw message