systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deron Eriksson <deroneriks...@gmail.com>
Subject Re: test suite running slowly after disable cache/sparse commit?
Date Fri, 09 Dec 2016 23:52:36 GMT
Hi,

It looks like we had another timeout on the daily build:
https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/677/console

Deron


On Thu, Dec 8, 2016 at 9:59 PM, Acs S <acs_s@yahoo.com.invalid> wrote:

> +1 On adding Jenkins Build machines on PR builds.
> Couple of times I hit waiting PR builds due to queue. If that is not
> common, we can wait.
> -Arvind      From: Deron Eriksson <deroneriksson@gmail.com>
>  To: dev@systemml.incubator.apache.org
>  Sent: Friday, December 9, 2016 7:34 AM
>  Subject: Re: test suite running slowly after disable cache/sparse commit?
>
> Hi Fred,
>
> The last two daily tests ran around ~2:56 hr, so if this number is stable,
> it seems that the new tests potentially add about half an hour to the test
> suite time. I would like if we could decrease the test suite time rather
> than add significantly to it. In fact, personally I'd prefer if we could do
> something like move the time-consuming algorithm-type tests out of the main
> test suite and just run the algorithm tests daily (if this is technically
> possible). That way, we could get the main test suite time to be sped up
> significantly but still benefit from daily test coverage provided by the
> algorithm tests. I like the idea of a short test suite time since that
> makes it easier to get feedback and continue working on an issue that day.
> If the tests take too long to run, it means that issues that could
> potentially be solved in one day will get pushed out to another day.
>
> Increasing the number of simultaneous Jenkins jobs allowed could help with
> queued-up builds, which would be nice. Currently Jenkins runs a max of two
> simultaneous jobs. Jenkins currently handles:
> 1) two daily builds (at noon and at midnight)
> 2) on-demand builds (so a developer can commit some code on a branch and
> then have jenkins build/test so that a developer's machine isn't tied up)
> 3) pull request builds (the initial push with a PR will trigger this along
> with any subsequent pushes to the branch referenced by the PR).
>
> Today there is not a queue, but I'm the only person to trigger a PR build
> today. If more than two developers are submitting PRs that day, there will
> be a queue. This queue has been manageable, but if the increase in test
> suite time is a permanent thing, I'd recommend bumping the simultaneous
> Jenkins jobs from two to four.
>
> Deron
>
>
>
> On Thu, Dec 8, 2016 at 4:49 PM, Frederick R Reiss <frreiss@us.ibm.com>
> wrote:
>
> > +dev list
> >
> > I personally don't mind letting the regression suite run overnight. The
> > important thing is that we do not push changes that have not passed the
> > full automated test suite. In the interest of efficiency, we shouldn't
> even
> > be reviewing most PRs until after they have passed the automated tests.
> >
> > Deron, are you seeing a backlog of not-yet-started builds queueing up on
> > the PR build server? If the queue is getting long, we can add additional
> > machines to the Jenkins cluster.
> >
> > Fred
> >
> > [image: Inactive hide details for Deron Eriksson---12/08/2016 11:06:52
> > AM---Hi Niketan,]Deron Eriksson---12/08/2016 11:06:52 AM---Hi Niketan,
> >
> > From: Deron Eriksson/San Francisco/IBM
> > To: Niketan Pansare/Almaden/IBM@IBMUS
> > Cc: Berthold Reinwald/Almaden/IBM@IBMUS, Frederick R
> > Reiss/Almaden/IBM@IBMUS
> > Date: 12/08/2016 11:06 AM
> > Subject: Re: test suite running slowly after disable cache/sparse commit?
> > ------------------------------
> >
> >
> >
> > Hi Niketan,
> >
> > Perhaps Berthold or Fred could add a little guidance here in terms of
> what
> > is acceptable? Having the test suite go from 2:21 to 3:41 (one pull
> request
> > yesterday took 4:11 to complete -
> > *https://sparktc.ibmcloud.com/jenkins/job/SystemML-
> PullRequestBuilder/909/*
> > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-
> PullRequestBuilder/909/>)
> > is very serious to me. Even if the test suite runs at 3:00, this is a
> > serious slowdown. It slows down our ability to validate pull requests and
> > other code on jenkins.
> >
> > Deron
> >
> >
> > ----- Original message -----
> > From: Niketan Pansare/Almaden/IBM
> > To: Deron Eriksson/San Francisco/IBM@ibmus
> > Cc: Berthold Reinwald/Almaden/IBM@ibmus, Frederick R
> > Reiss/Almaden/IBM@ibmus
> > Subject: Re: test suite running slowly after disable cache/sparse commit?
> > Date: Thu, Dec 8, 2016 8:55 AM
> >
> > Hi Deron,
> >
> > The commit replicated application tests for disable sparse and disable
> > caching. So, the test time should increase. We should increase the
> duration
> > or reduce the number of application tests we want to test with caching
> and
> > sparse disabled.
> >
> > Thanks
> >
> > Niketan
> >
> > On Dec 8, 2016, at 7:47 AM, Deron Eriksson <*deron@us.ibm.com*
> > <deron@us.ibm.com>> wrote:
> >
> >    Hi Niketan,
> >
> >      I noticed the daily test yesterday timed out, probably because of a
> >      long-running test.
> >
> >      Looking at the commits from the day before (
> >      *https://github.com/apache/incubator-systemml/commits/master*
> >      <https://github.com/apache/incubator-systemml/commits/master>), I
> >      noticed that [SYSTEMML-769] [SYSTEMML-1140] Removed
> -disable-caching and
> >      -disable-… (
> >      *https://github.com/apache/incubator-systemml/commit/
> caaaec90b61e529e50021d89f9f108230fa307a8*
> >      <https://github.com/apache/incubator-systemml/commit/
> caaaec90b61e529e50021d89f9f108230fa307a8>)
> >      updated some of the tests.
> >
> >      So I ran the tests on the previous commit (
> >      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/*
> >      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/227/>)
> >      and the tests ran in 2hr 21min.
> >
> >      I ran the tests on the 'disable caching...' commit (
> >      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/*
> >      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-OnDemand/228/>)
> >      and the tests ran in 3hr 41min.
> >
> >      One thing that is confusing to me is that the nightly test just
> >      completed successfully (
> >      *https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/*
> >      <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/674/>)
> >      in 2hr 57min and did not time out like yesterday afternoon. So it
> is always
> >      possible it could be a server issue.
> >
> >      Could you look into this and see if that commit introduced an issue
> >      with the tests?
> >
> >      Thanks!
> >      Deron
> >
> >
> >
> >
> >
> >
>
>
> --
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/
>
>
>



-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message