ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Fernandez <afernan...@hortonworks.com>
Subject Re: Server unit tests take too long (30+ minutes)
Date Wed, 25 Mar 2015 18:39:19 GMT
Fantastic, thank you Jonathan and Jayush for your urgency.

On 3/25/15, 11:28 AM, "Jonathan Hurley" <jhurley@hortonworks.com> wrote:

>Builds are passing again after fixing 10197:
>
>https://builds.apache.org/job/Ambari-trunk-Commit/2111
>
>> On Mar 24, 2015, at 11:45 PM, Jayush Luniya <jluniya@hortonworks.com>
>>wrote:
>> 
>> Done. 
>> 
>> https://issues.apache.org/jira/browse/AMBARI-10197
>> 
>> Thanks
>> Jayush
>> 
>> 
>> On 3/24/15, 7:50 PM, "Jonathan Hurley" <jhurley@hortonworks.com> wrote:
>> 
>>> Ah, I see that. Looks like TestController.TestController is a common
>>> theme here then. I tried running the tests on CentOS 6 instead of OSX
>>>and
>>> it looks like mine hung on test_certSigningFailed the first time and
>>> test_heartbeat_no_host_check_cmd_in_queue the second time.
>>> 
>>> Let’s open up a Jira for this so it can be tracked and resolved.
>>> 
>>>> On Mar 24, 2015, at 7:20 PM, Jayush Luniya <jluniya@hortonworks.com>
>>>> wrote:
>>>> 
>>>> Hi Jonathan,
>>>> Yes, as I mentioned the UT tests hang which is not 100% repro. The BOA
>>>> is
>>>> aborted after 2 hours.
>>>> 
>>>> However the builds always hang during Ambari Agent Test. If you see
>>>>the
>>>> logs further up, you will see that the actual abort happened during
>>>>the
>>>> TestController UTs (I.e. Python was terminated), but the build was not
>>>> yet
>>>> entirely terminated and hence we continue building the ambari client,
>>>> python client until it was completely aborted.
>>>> 
>>>> test_addToStatusQueue (TestController.TestController) ... ok
>>>> test_certSigningFailed (TestController.TestController) ... ok
>>>> test_heartbeatWithServer (TestController.TestController) ... ok
>>>> test_registerAndHeartbeat (TestController.TestController) ... ok
>>>> test_registerAndHeartbeatWithException (TestController.TestController)
>>>> ...
>>>> ok
>>>> test_registerAndHeartbeat_check_registration_listener
>>>> (TestController.TestController) ... Build timed out (after 120
>>>>minutes).
>>>> Marking the build as aborted.
>>>> Build was aborted
>>>> 
>>>> 
>>>>/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/
>>>>..
>>>> /a
>>>> mbari-common/src/main/unix/ambari-python-wrap: line 40: 31955
>>>>Terminated
>>>>          $PYTHON "$@"
>>>> [INFO]        
>>>> 
>>>> [INFO] 
>>>> 
>>>>-----------------------------------------------------------------------
>>>>-
>>>> [INFO] Building Ambari Client 2.0.0-SNAPSHOT
>>>> [INFO] 
>>>> 
>>>>-----------------------------------------------------------------------
>>>>-
>>>> [INFO] 
>>>> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @
>>>>ambari-client
>>>> ---
>>>> [INFO] Deleting
>>>> 
>>>>/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client
>>>> (includes = [**/*.pyc], excludes = [])
>>>> [INFO] 
>>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>>> (parse-package-version) @ ambari-client ---
>>>> [INFO] 
>>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>>> (parse-package-release) @ ambari-client ---
>>>> [INFO] 
>>>> [INFO] --- apache-rat-plugin:0.11:check (default) @ ambari-client ---
>>>> [INFO] 53 implicit excludes (use -debug for more details).
>>>> [INFO] No excludes explicitly specified.
>>>> [INFO] 2 resources included (use -debug for more details)
>>>> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0
>>>>generated:
>>>> 0
>>>> approved: 2 licence.
>>>> [INFO] 
>>>> [INFO] --- maven-assembly-plugin:2.2-beta-5:single (build-tarball) @
>>>> ambari-client ---
>>>> [INFO] Reading assembly descriptor: assemblies/client.xml
>>>> [INFO] 
>>>> [INFO] --- maven-assembly-plugin:2.2-beta-5:single (make-assembly) @
>>>> ambari-client ---
>>>> [INFO] Reading assembly descriptor: assemblies/client.xml
>>>> [INFO] 
>>>> [INFO] --- maven-install-plugin:2.4:install (default-install) @
>>>> ambari-client ---
>>>> [INFO] Installing
>>>> 
>>>> 
>>>>/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client
>>>>/p
>>>> om
>>>> .xml to 
>>>> 
>>>> 
>>>>/home/jenkins/.m2/repository/org/apache/ambari/ambari-client/2.0.0-SNAP
>>>>SH
>>>> OT
>>>> /ambari-client-2.0.0-SNAPSHOT.pom
>>>> [INFO]        
>>>> 
>>>> [INFO] 
>>>> 
>>>>-----------------------------------------------------------------------
>>>>-
>>>> [INFO] Building Ambari Python Client 2.0.0-SNAPSHOT
>>>> [INFO] 
>>>> 
>>>>-----------------------------------------------------------------------
>>>>-
>>>> [INFO] 
>>>> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @
>>>>python-client
>>>> ---
>>>> [INFO] Deleting
>>>> 
>>>> 
>>>>/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client
>>>>/p
>>>> yt
>>>> hon-client (includes = [**/*.pyc], excludes = [])
>>>> [INFO] 
>>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>>> (parse-package-version) @ python-client ---
>>>> [INFO] 
>>>> [INFO] --- build-helper-maven-plugin:1.8:regex-property
>>>> (parse-package-release) @ python-client ---
>>>> [INFO] 
>>>> [INFO] --- exec-maven-plugin:1.2:exec (python-test) @ python-client
>>>>---
>>>> Updating AMBARI-10163
>>>> Recording test results
>>>> Warning: you have no plugins providing access control for builds, so
>>>> falling back to legacy behavior of permitting any downstream builds to
>>>> be
>>>> triggered
>>>> Finished: ABORTED
>>>> 
>>>> Thanks
>>>> Jayush
>>>> 
>>>> On 3/24/15, 1:25 PM, "Jonathan Hurley" <jhurley@hortonworks.com>
>>>>wrote:
>>>> 
>>>>> I think that we¹re looking in the wrong places. Consider:
>>>>> 
>>>>> https://builds.apache.org/job/Ambari-trunk-Commit/2101
>>>>> and
>>>>> https://builds.apache.org/job/Ambari-trunk-Commit/2100
>>>>> 
>>>>> 2101 successfully built in about an hour. 2100 did not; it aborted
>>>>> after
>>>>> 2 hours. It aborted during the Groovy unit tests. Ambari unit test
>>>>>time
>>>>> variances should not swing the total job time by an hour.
>>>>> 
>>>>> Perhaps something else is going gone here. Maybe there¹s a network
>>>>> issue
>>>>> and Git or one of the maven build steps is taking too long.
>>>>> 
>>>>> The pattern seems to be that the builds are not stuck since they are
>>>>> aborted at different stages in between jobs. Groovy, agent tests,
>>>>>etc.
>>>>> 
>>>>> 
>>>>> On Mar 24, 2015, at 4:07 PM, Jonathan Hurley
>>>>> <jhurley@hortonworks.com<mailto:jhurley@hortonworks.com>>
wrote:
>>>>> 
>>>>> No, that change should have no effect on the tests. There were
>>>>>aborted
>>>>> runs before that change, and there were failed runs after it. It
>>>>>seems
>>>>> like in some cases, the tests just take too long.
>>>>> 
>>>>> On Mar 24, 2015, at 3:55 PM, Jayush Luniya
>>>>> <jluniya@hortonworks.com<mailto:jluniya@hortonworks.com>>
wrote:
>>>>> 
>>>>> This is the change that went in in build#2072.
>>>>> 
>>>>> Jonathan, any change the issue below could have been caused by it?
>>>>> Sumit, what was the commit version of your change to reenable
>>>>> TestController tests and when was it committed?
>>>>> 
>>>>> 
>>>>> 1. AMBARI-10126 <https://issues.apache.org/jira/browse/AMBARI-10126>
>>>>>-
>>>>> Alert Scheduler Is Double Scheduling Jobs (jonathanhurley) (details
>>>>> 
>>>>> 
>>>>><https://builds.apache.org/job/Ambari-trunk-Commit/2072/changes#detail
>>>>>0>
>>>>> )
>>>>> 
>>>>> Commit 68468feeeeb35ca9edd4899ea8b1abafb7c2742a
>>>>> 
>>>>> 
>>>>><http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=68468f
>>>>>ee
>>>>> ee
>>>>> b
>>>>> 35ca9edd4899ea8b1abafb7c2742a> by jhurley
>>>>> <https://builds.apache.org/user/jhurley/>AMBARI-10126
>>>>> <https://issues.apache.org/jira/browse/AMBARI-10126> - Alert
>>>>>Scheduler
>>>>> Is
>>>>> Double Scheduling Jobs (jonathanhurley)
>>>>> 
>>>>> ambari-agent/src/main/python/ambari_agent/Controller.py
>>>>> 
>>>>> 
>>>>><http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blob&f=ambari-a
>>>>>ge
>>>>> nt
>>>>> /
>>>>> 
>>>>> 
>>>>>src/main/python/ambari_agent/Controller.py&h=bb85337bfdf2404a6aabf78eb
>>>>>36
>>>>> 1c
>>>>> 1
>>>>> 12f77c977e&hb=68468feeeeb35ca9edd4899ea8b1abafb7c2742a> (diff)
>>>>> 
>>>>> 
>>>>><http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blobdiff&f=amba
>>>>>ri
>>>>> -a
>>>>> g
>>>>> 
>>>>> 
>>>>>ent/src/main/python/ambari_agent/Controller.py&fp=ambari-agent/src/mai
>>>>>n/
>>>>> py
>>>>> t
>>>>> 
>>>>> 
>>>>>hon/ambari_agent/Controller.py&h=eeca4c294399e04dae8d893f078d6e6125f3d
>>>>>f4
>>>>> 7&
>>>>> h
>>>>> 
>>>>> 
>>>>>p=bb85337bfdf2404a6aabf78eb361c112f77c977e&hb=68468feeeeb35ca9edd4899e
>>>>>a8
>>>>> b1
>>>>> a
>>>>> bafb7c2742a&hpb=32e1215639f3cdfea68e2955f316576f1ded85fe>
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Jayush
>>>>> 
>>>>> On 3/24/15, 12:49 PM, "Sumit Mohanty"
>>>>> <smohanty@hortonworks.com<mailto:smohanty@hortonworks.com>>
wrote:
>>>>> 
>>>>> The TestController are the tests I re-enabled to run on mac recently.
>>>>> So
>>>>> we may see these failures locally as well if your dev box is mac.
>>>>> ________________________________________
>>>>> From: Jayush Luniya
>>>>> <jluniya@hortonworks.com<mailto:jluniya@hortonworks.com>>
>>>>> Sent: Tuesday, March 24, 2015 12:24 PM
>>>>> To: Alejandro Fernandez;
>>>>> dev@ambari.apache.org<mailto:dev@ambari.apache.org>
>>>>> Subject: Re: Server unit tests take too long (30+ minutes)
>>>>> 
>>>>> Agreed we should take a look at reducing our test times.
>>>>> 
>>>>> Also, I looked at the latest builds on trunk, looks like there agent
>>>>> tests are hanging as well leading to builds being aborted. Culprit
>>>>> seems
>>>>> to be TestController tests. This is not a consistent failure but
>>>>> happens
>>>>> very frequently since build#2072
>>>>> https://builds.apache.org/job/Ambari-trunk-Commit/
>>>>> 
>>>>> 
>>>>> test_repeatRegistration (TestController.TestController) ... ok
>>>>> test_restartAgent (TestController.TestController) ... ok
>>>>> test_run (TestController.TestController) ... Build timed out (after
>>>>>120
>>>>> minutes). Marking the build as aborted.
>>>>> Build was aborted
>>>>> 
>>>>> 
>>>>>/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent
>>>>>/.
>>>>> ./
>>>>> ambari-common/src/main/unix/ambari-python-wrap: line 40: 20024
>>>>> Terminated
>>>>>        $PYTHON "$@"
>>>>> 
>>>>> Thanks
>>>>> Jayush
>>>>> 
>>>>> From: Alejandro Fernandez
>>>>> <afernandez@hortonworks.com<mailto:afernandez@hortonworks.com>>
>>>>> Date: Tuesday, March 24, 2015 at 12:18 PM
>>>>> To: "dev@ambari.apache.org<mailto:dev@ambari.apache.org>"
>>>>> <dev@ambari.apache.org<mailto:dev@ambari.apache.org>>
>>>>> Cc: Jayush Luniya
>>>>> <jluniya@hortonworks.com<mailto:jluniya@hortonworks.com>>
>>>>> Subject: Re: Server unit tests take too long (30+ minutes)
>>>>> 
>>>>> +1 to that.
>>>>> 
>>>>> grep -B1 ".*sec$" ~/test_times.txt | sed 's/^.*Time elapsed:
>>>>> \(.*\)$/\1/'
>>>>> 
>>>>> Here's another run with all tests that took over 30 secs. Total time
>>>>>in
>>>>> these 28 test classes was 28 mins.
>>>>> The biggest culprit was AmbariManagementControllerTest at 5:28
>>>>> 
>>>>> Running org.apache.ambari.server.agent.TestHeartbeatHandler
>>>>> 89.435 sec
>>>>> 
>>>>> Running org.apache.ambari.server.upgrade.UpgradeTest
>>>>> 76.566 sec
>>>>> 
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.security.authorization.AmbariLdapAuthenticati
>>>>>on
>>>>> Pr
>>>>> oviderForDNWithSpaceTest
>>>>> 55.582 sec
>>>>> 
>>>>> Running org.apache.ambari.server.security.authorization.TestUsers
>>>>> 43.228 sec
>>>>> 
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.security.authorization.AmbariLdapAuthenticati
>>>>>on
>>>>> Pr
>>>>> oviderTest
>>>>> 57.922 sec
>>>>> 
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.controller.internal.StackDefinedPropertyProvi
>>>>>de
>>>>> rT
>>>>> est
>>>>> 56.585 sec
>>>>> 
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.controller.internal.RepositoryVersionResource
>>>>>Pr
>>>>> ov
>>>>> iderTest
>>>>> 60.788 sec
>>>>> 
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.controller.internal.UpgradeResourceProviderTe
>>>>>st
>>>>> 40.329 sec
>>>>> 
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.controller.internal.HostStackVersionResourceP
>>>>>ro
>>>>> vi
>>>>> derTest
>>>>> 34.812 sec
>>>>> 
>>>>> Running
>>>>> 
>>>>>org.apache.ambari.server.controller.internal.StageResourceProviderTest
>>>>> 37.434 sec
>>>>> 
>>>>> Running org.apache.ambari.server.controller.AmbariServerTest
>>>>> 37.638 sec
>>>>> 
>>>>> Running 
>>>>> org.apache.ambari.server.controller.AmbariManagementControllerTest
>>>>> 317.327 sec
>>>>> 
>>>>> Running 
>>>>>org.apache.ambari.server.actionmanager.TestActionDBAccessorImpl
>>>>> 53.404 sec
>>>>> 
>>>>> Running 
>>>>>org.apache.ambari.server.scheduler.ExecutionScheduleManagerTest
>>>>> 34.245 sec
>>>>> 
>>>>> Running
>>>>> org.apache.ambari.server.notifications.dispatchers.SNMPDispatcherTest
>>>>> 34.732 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.UpgradeHelperTest
>>>>> 35.616 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.alerts.AlertEventPublisherTest
>>>>> 62.627 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.alerts.AlertDefinitionHashTest
>>>>> 42.206 sec
>>>>> 
>>>>> Running 
>>>>> org.apache.ambari.server.state.alerts.AlertStateChangedEventTest
>>>>> 41.462 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.stack.UpgradePackTest
>>>>> 72.379 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.ConfigHelperTest
>>>>> 72.849 sec
>>>>> 
>>>>> Running
>>>>> org.apache.ambari.server.state.svccomphost.ServiceComponentHostTest
>>>>> 50.383 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.cluster.ClusterTest
>>>>> 69.889 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
>>>>> 80.271 sec
>>>>> 
>>>>> Running org.apache.ambari.server.state.ServiceTest
>>>>> 45.443 sec
>>>>> 
>>>>> Running org.apache.ambari.server.orm.dao.AlertsDAOTest
>>>>> 57.077 sec
>>>>> 
>>>>> Running org.apache.ambari.server.orm.dao.AlertDefinitionDAOTest
>>>>> 33.872 sec
>>>>> 
>>>>> Running org.apache.ambari.server.metadata.RoleCommandOrderTest
>>>>> 31.794 sec
>>>>> 
>>>>> Thanks,
>>>>> Alejandro
>>>>> 
>>>>> On 3/24/15, 11:54 AM, "Jonathan Hurley"
>>>>> <jhurley@hortonworks.com<mailto:jhurley@hortonworks.com>>
wrote:
>>>>> 
>>>>> Many of these, such as the deadlock tests and alert tests are just
>>>>> going
>>>>> to take a long time due to the nature of what they're doing. In
>>>>> general,
>>>>> if b.a.o is timing out, we need to either increase the timeout for
>>>>>the
>>>>> job or change our pom.xml to allow for forked execution of the tests.
>>>>> 
>>>>> In my local environment, 3 concurrent forks can run through the test
>>>>> suite in about 20 minutes. The problem is that both LDAP tests below
>>>>> always fail in a forked environment. I'd say if we want to get the
>>>>> build
>>>>> times down, we should look into making the 2 LDAP tests work with
>>>>> forked
>>>>> test runners in the pom.xml
>>>>> 
>>>>> On Mar 24, 2015, at 2:33 PM, Sumit Mohanty
>>>>> <smohanty@hortonworks.com<mailto:smohanty@hortonworks.com>>
wrote:
>>>>> ?Hi,
>>>>> these are some of the unit tests that take too long (more than 30
>>>>> seconds
>>>>> on my machine).  There are several that are above 10 seconds but
>>>>>below
>>>>> 30
>>>>> seconds range that can also use some optimization.
>>>>> Jayush tells me that the Apache builds may be getting aborted as the
>>>>> build + UT run takes more than an hour.
>>>>> I will look into some of it when I get a chance. If there are any
>>>>>that
>>>>> piques your curiosity then take a look.
>>>>> Running org.apache.ambari.server.agent.TestHeartbeatHandler
>>>>> Tests run: 34, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>67.43
>>>>> sec
>>>>> Running org.apache.ambari.server.state.cluster.ClusterTest
>>>>> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>55.576
>>>>> sec
>>>>> Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
>>>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>52.252
>>>>> sec
>>>>> Running org.apache.ambari.server.upgrade.UpgradeTest
>>>>> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>50.433
>>>>> sec
>>>>> Running org.apache.ambari.server.orm.dao.AlertDispatchDAOTest
>>>>> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>46.681
>>>>> sec
>>>>> Running org.apache.ambari.server.orm.dao.AlertsDAOTest
>>>>> Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>44.474
>>>>> sec
>>>>> Running org.apache.ambari.server.security.authorization.TestUsers
>>>>> Tests run: 26, Failures: 0, Errors: 0, Skipped: 1, Time elapsed:
>>>>>36.421
>>>>> sec
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.security.authorization.AmbariLdapAuthenticati
>>>>>on
>>>>> Pr
>>>>> oviderTest
>>>>> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.46
>>>>> sec
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.security.authorization.AmbariLdapAuthenticati
>>>>>on
>>>>> Pr
>>>>> oviderForDNWithSpaceTest
>>>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>35.706
>>>>> sec
>>>>> Running org.apache.ambari.server.state.ConfigHelperTest
>>>>> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>31.863
>>>>> sec
>>>>> Running
>>>>> 
>>>>> 
>>>>>org.apache.ambari.server.controller.internal.StackDefinedPropertyProvi
>>>>>de
>>>>> rT
>>>>> est
>>>>> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
>>>>>31.247
>>>>> sec
>>>>> ...
>>>>> thanks
>>>>> ?-Sumit
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Mime
View raw message