systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deron Eriksson <deroneriks...@gmail.com>
Subject Re: test suite running slowly after disable cache/sparse commit?
Date Fri, 09 Dec 2016 02:04:23 GMT
Hi Fred,

The last two daily tests ran around ~2:56 hr, so if this number is stable,
it seems that the new tests potentially add about half an hour to the test
suite time. I would like if we could decrease the test suite time rather
than add significantly to it. In fact, personally I'd prefer if we could do
something like move the time-consuming algorithm-type tests out of the main
test suite and just run the algorithm tests daily (if this is technically
possible). That way, we could get the main test suite time to be sped up
significantly but still benefit from daily test coverage provided by the
algorithm tests. I like the idea of a short test suite time since that
makes it easier to get feedback and continue working on an issue that day.
If the tests take too long to run, it means that issues that could
potentially be solved in one day will get pushed out to another day.

Increasing the number of simultaneous Jenkins jobs allowed could help with
queued-up builds, which would be nice. Currently Jenkins runs a max of two
simultaneous jobs. Jenkins currently handles:
1) two daily builds (at noon and at midnight)
2) on-demand builds (so a developer can commit some code on a branch and
then have jenkins build/test so that a developer's machine isn't tied up)
3) pull request builds (the initial push with a PR will trigger this along
with any subsequent pushes to the branch referenced by the PR).

Today there is not a queue, but I'm the only person to trigger a PR build
today. If more than two developers are submitting PRs that day, there will
be a queue. This queue has been manageable, but if the increase in test
suite time is a permanent thing, I'd recommend bumping the simultaneous
Jenkins jobs from two to four.

Deron



On Thu, Dec 8, 2016 at 4:49 PM, Frederick R Reiss <frreiss@us.ibm.com>
wrote:

> +dev list
>
> I personally don't mind letting the regression suite run overnight. The
> important thing is that we do not push changes that have not passed the
> full automated test suite. In the interest of efficiency, we shouldn't even
> be reviewing most PRs until after they have passed the automated tests.
>
> Deron, are you seeing a backlog of not-yet-started builds queueing up on
> the PR build server? If the queue is getting long, we can add additional
> machines to the Jenkins cluster.
>
> Fred
>
> [image: Inactive hide details for Deron Eriksson---12/08/2016 11:06:52
> AM---Hi Niketan,]Deron Eriksson---12/08/2016 11:06:52 AM---Hi Niketan,
>
> From: Deron Eriksson/San Francisco/IBM
> To: Niketan Pansare/Almaden/IBM@IBMUS
> Cc: Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R
> Reiss/Almaden/IBM@IBMUS
> Date: 12/08/2016 11:06 AM
> Subject: Re: test suite running slowly after disable cache/sparse commit?
> ------------------------------
>
>
>
> Hi Niketan,
>
> Perhaps Berthold or Fred could add a little guidance here in terms of what
> is acceptable? Having the test suite go from 2:21 to 3:41 (one pull request
> yesterday took 4:11 to complete -
> *https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909/*
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/909/>)
> is very serious to me. Even if the test suite runs at 3:00, this is a
> serious slowdown. It slows down our ability to validate pull requests and
> other code on jenkins.
>
> Deron
>
>
> ----- Original message -----
> From: Niketan Pansare/Almaden/IBM
> To: Deron Eriksson/San Francisco/IBM@ibmus
> Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R
> Reiss/Almaden/IBM@ibmus
> Subject: Re: test suite running slowly after disable cache/sparse commit?
> Date: Thu, Dec 8, 2016 8:55 AM
>
> Hi Deron,
>
> The commit replicated application tests for disable sparse and disable
> caching. So, the test time should increase. We should increase the duration
> or reduce the number of application tests we want to test with caching and
> sparse disabled.
>
> Thanks
>
> Niketan
>
> On Dec 8, 2016, at 7:47 AM, Deron Eriksson <*deron@us.ibm.com*
> <deron@us.ibm.com>> wrote:
>
>    Hi Niketan,
>
>       I noticed the daily test yesterday timed out, probably because of a
>       long-running test.
>
>       Looking at the commits from the day before (
>       *https://github.com/apache/incubator-systemml/commits/master*
>       <https://github.com/apache/incubator-systemml/commits/master>), I
>       noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed -disable-caching and
>       -disable-… (
>       *https://github.com/apache/incubator-systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8*
>       <https://github.com/apache/incubator-systemml/commit/caaaec90b61e529e50021d89f9f108230fa307a8>)
>       updated some of the tests.
>
>       So I ran the tests on the previous commit (
>       *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/*
>       <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/>)
>       and the tests ran in 2hr 21min.
>
>       I ran the tests on the 'disable caching...' commit (
>       *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/*
>       <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/>)
>       and the tests ran in 3hr 41min.
>
>       One thing that is confusing to me is that the nightly test just
>       completed successfully (
>       *https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/*
>       <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/>)
>       in 2hr 57min and did not time out like yesterday afternoon. So it is always
>       possible it could be a server issue.
>
>       Could you look into this and see if that commit introduced an issue
>       with the tests?
>
>       Thanks!
>       Deron
>
>
>
>
>
>


-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message