trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Zeller <hans.zel...@esgyn.com>
Subject Re: Non deteminism in tests (Trafodion master Daily Test Result - 165)
Date Mon, 04 Apr 2016 20:08:33 GMT
Hi Sandhya,

Yes, that's a good idea. Would it be hard to implement a "jenkins,
run-full-tests" comment that would trigger the same tests as a nightly
build? That would allow testing some complex or risky changes more
thoroughly, without holding up the small and easy ones.

Hans

On Mon, Apr 4, 2016 at 12:12 PM, Sandhya Sundaresan <
sandhya.sundaresan@esgyn.com> wrote:

>  RE: Non deteminism in tests (Trafodion master Daily Test Result - 165)
>
> For Option 2, we could have an option in Jenkins so that a committer  could
> add a comment like "jenkins, run full tests" that would kick off the full
> test run.
>
> For  a small change we may not need to run full tests and committers could
> decide that.
>
> Sandhya
>
> -----Original Message-----
> From: Sandhya Sundaresan [mailto:sandhya.sundaresan@esgyn.com
> <sandhya.sundaresan@esgyn.com>]
> Sent: Monday, April 4, 2016 12:05 PM
> To: 'dev@trafodion.incubator.apache.org' <
> dev@trafodion.incubator.apache.org
> >
> Subject: RE: Non deteminism in tests (Trafodion master Daily Test Result -
> 165)
>
> Today,  when this daily email comes out from Steve, I think only  couple of
> us are paying attention to the results. We try to get after the folks who
> checked in and try to see if their checkin caused the failures.
>
> Clearly this adhoc process isn't going to work well.
>
> So 2 options to get past this :
>
> 1.  Every pull request should kick off all the tests - not adjust the small
> core set.
>
> To avoid long queues, we could build some intelligence into the system that
> kicks off tests to club 3 or 4 PRs that come within  the hour and run one
> full test run for all 3 combined. Unless we automate this full regression
> testing  , whatever Han's lists below will continue to be a problem. (Do we
> have the test machine resources to do this ?)
>
> 2. The other option is for Trafodion committers to NOT commit/merge the PR
> until the results of the entire SQL  regression  and/or Phoenix tests run
> have been posted as a comment in the PR .
>
> Thanks
>
> Sandhya
>
> -----Original Message-----
>
> From: Hans Zeller [mailto:hans.zeller@esgyn.com <hans.zeller@esgyn.com>]
>
> Sent: Monday, April 4, 2016 10:59 AM
>
> To: dev <dev@trafodion.incubator.apache.org>
>
> Subject: Re: Trafodion master Daily Test Result - 165
>
> Don't know about you, but seeing tests fail every single day makes me kind
> of indifferent to these test failures, after a few hundred of them... I
> really wish we could have an environment where the daily email we get is at
> least 8 out of 10 times a clear indicator whether a build is good or not.
>
> In other words, all tests pass for a good build, or tests fail for a bad
> build. Right now, every single day we see failures, so what does that mean?
>
> A few things we could consider:
>
>    - Categorize. IMHO that's one of the key ways to deal with errors:
>
>       - Deterministic issues:
>
>          - Failure to run relevant tests - update to an expected file
>
>          missing.
>
>          - Deterministic bugs introduced.
>
>       - Non-deterministic bugs - those are much harder to deal with:
>
>          - Non-deterministic issues in our code.
>
>          - Instability of the underlying platform.
>
>       - Document: Make sure we have JIRAs for all issues that affect
>
>    regression test failures.
>
>    - Communicate what and who causes test failures. A side-effect of the
>
>    previous bullet. Most of us break the build once in a while, but we
> should
>
>    try not to do it too often.
>
>    - A few additional things:
>
>       - Some of our tests are poorly designed, leading to a lot of false
>
>       failures. Usually because they try to test too much.
>
>       - Some of our tests cause failures when run twice on the development
>
>       platform, usually due to missing cleanup.
>
> This would take some effort. On the other hand, having a clear pass/fail
> indication for a build saves us all a lot of time.
>
> Hans
>
> On Mon, Apr 4, 2016 at 10:49 AM, Qifan Chen <qifan.chen@esgyn.com> wrote:
>
> > It probably will be too time consuming to test 3 or <n> flavors by
>
> > developers in general.
>
> >
>
> > Would it be possible to use a default flavor out of box (i.e.,
>
> > configured during a local install), or select a particular flavor if
> there is a need?
>
> >
>
> > Thanks --Qifan
>
> >
>
> > On Mon, Apr 4, 2016 at 11:52 AM, Sandhya Sundaresan <
>
> > sandhya.sundaresan@esgyn.com> wrote:
>
> >
>
> > > Hi Steve,
>
> > >  I was not suggesting taking any out of the build. I was suggesting
>
> > > that all
>
> > > 3 flavors get run nightly in the official build that you kick off
>
> > > -the
>
> > same
>
> > > way full regressions are done only nightly.
>
> > > But if building all 3 flavors doesn't add too much to each
>
> > > developer's build time, I think it's fine to build all 3 flavors. It
>
> > > will certainly be
>
> > easier
>
> > > to isolate build issues individually.
>
> > >
>
> > > Sandhya
>
> > >
>
> > >
>
> > > -----Original Message-----
>
> > > From: Steve Varnau [mailto:steve.varnau@esgyn.com
> <steve.varnau@esgyn.com>]
>
> > > Sent: Monday, April 4, 2016 9:46 AM
>
> > > To: dev@trafodion.incubator.apache.org
>
> > > Subject: RE: Trafodion master Daily Test Result - 165
>
> > >
>
> > > It would be fine for developers to comment out one of the flavors to
>
> > > work around an external problem like this.
>
> > >
>
> > > In general, however, I think it is important that we maintain all
>
> > > the current flavors of TRX in the build.  It is really nice that the
>
> > > default build will work on any of those 3 distros.  If we take any
>
> > > of them out of the build, the more likely they will be to break when
> they are tried.
>
> > >
>
> > > --Steve
>
> > >
>
> > >
>
> > > > -----Original Message-----
>
> > > > From: Sandhya Sundaresan [mailto:sandhya.sundaresan@esgyn.com
> <sandhya.sundaresan@esgyn.com>]
>
> > > > Sent: Monday, April 4, 2016 9:38 AM
>
> > > > To: dev@trafodion.incubator.apache.org
>
> > > > Subject: RE: Trafodion master Daily Test Result - 165
>
> > > >
>
> > > > Sounds good.
>
> > > > We could deliver this - can you do that Selva ?
>
> > > >
>
> > > > But the next question is that developers probably don't need to
>
> > > > build all the versions everytime. Cdh, hdp and vanilla Apache.
>
> > > > But the  nightly build  each night should probably build all 3
>
> > > > versins to ensure they are working fine.
>
> > > > Any comments ?
>
> > > >
>
> > > > Developers should still be abel to build any version by either a
>
> > > > make option or an envvar setting though  but default could just be
> cdh.
>
> > > >
>
> > > > Sandhya
>
> > > >
>
> > > >
>
> > > > -----Original Message-----
>
> > > > From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com
> <selva.govindarajan@esgyn.com>]
>
> > > > Sent: Monday, April 4, 2016 8:38 AM
>
> > > > To: dev@trafodion.incubator.apache.org
>
> > > > Subject: RE: Trafodion master Daily Test Result - 165
>
> > > >
>
> > > > RE: Trafodion master Daily Test Result - 165
>
> > > >
>
> > > > These changes seem to help to build successfully.  It looks like
>
> > > > Trafodion become too specific to a particular build of HDP.
>
> > > >
>
> > > >
>
> > > >
>
> > > > Selva
>
> > > >
>
> > > >
>
> > > >
>
> > > > index 3295b30..5c0b5a0 100644
>
> > > >
>
> > > > --- a/core/sqf/sqenvcom.sh
>
> > > >
>
> > > > +++ b/core/sqf/sqenvcom.sh
>
> > > >
>
> > > > @@ -142,8 +142,8 @@ export SQ_HOME=$PWD
>
> > > >
>
> > > > # set common version to be consistent between shared lib and maven
>
> > > > dependencies
>
> > > >
>
> > > > export HBASE_DEP_VER_CDH=1.0.0-cdh5.4.4
>
> > > >
>
> > > > export HIVE_DEP_VER_CDH=1.1.0-cdh5.4.4
>
> > > >
>
> > > > -export HBASE_DEP_VER_HDP=1.1.2.2.3.2.0-2950
>
> > > >
>
> > > > -export HIVE_DEP_VER_HDP=1.2.1.2.3.2.0-2950
>
> > > >
>
> > > > +export HBASE_DEP_VER_HDP=1.1.2
>
> > > >
>
> > > > +export HIVE_DEP_VER_HDP=1.2.1
>
> > > >
>
> > > > export HBASE_DEP_VER_APACHE=1.0.2
>
> > > >
>
> > > > export HIVE_DEP_VER_APACHE=1.1.0
>
> > > >
>
> > > > export HBASE_TRX_ID_CDH=hbase-trx-cdh5_4
>
> > > >
>
> > > > diff --git a/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp
>
> > > > b/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp
>
> > > >
>
> > > > index 045772c..8592b58 100755
>
> > > >
>
> > > > --- a/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp
>
> > > >
>
> > > > +++ b/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp
>
> > > >
>
> > > > @@ -49,7 +49,7 @@
>
> > > >
>
> > > >    </repositories>
>
> > > >
>
> > > >    <properties>
>
> > > >
>
> > > > -    <hadoop.version>2.7.1.2.3.2.0-2950</hadoop.version>
>
> > > >
>
> > > > +    <hadoop.version>2.7.1</hadoop.version>
>
> > > >
>
> > > >      <hbase.version>${env.HBASE_DEP_VER_HDP}</hbase.version>
>
> > > >
>
> > > >
>
> > > > <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>
> > > >
>
> > > >      <java.version>1.7</java.version>
>
> > > >
>
> > > >
>
> > > >
>
> > > > *From:* Sandhya Sundaresan [mailto:sandhya.sundaresan@esgyn.com
> <sandhya.sundaresan@esgyn.com>]
>
> > > > *Sent:* Monday, April 4, 2016 7:56 AM
>
> > > > *To:* dev@trafodion.incubator.apache.org
>
> > > > *Subject:* RE: Trafodion master Daily Test Result - 165
>
> > > >
>
> > > >
>
> > > >
>
> > > > We might need Ming's change (attached) checked in before we proceed .
>
> > > > He said it’s a workaround but perhaps that is the fix ?  Need
>
> > > > Hans, Prashanth to chime in here to verify.
>
> > > >
>
> > > > The build failed just like a few others said it did for them .  I
>
> > > > saw this same error(below)  in the make.log for the Sat build and
>
> > > > today's build.
>
> > > >
>
> > > > Sandhya
>
> > > >
>
> > > > <<...>>
>
> > > >
>
> > > > [ERROR] Failed to execute goal on project hbase-trx-hdp2_3: Could
>
> > > > not resolve dependencies for project
> org.apache:hbase-trx-hdp2_3:jar:2.0.0:
>
> > > > Failed to collect dependencies for
>
> > > > [org.apache.hbase:hbase-common:jar:1.1.2.2.3.2.0-2950 (compile),
>
> > > > org.apache.hbase:hbase-protocol:jar:1.1.2.2.3.2.0-2950 (compile),
>
> > > > org.apache.hbase:hbase-client:jar:1.1.2.2.3.2.0-2950 (compile),
>
> > > > org.apache.hbase:hbase-server:jar:1.1.2.2.3.2.0-2950 (compile),
>
> > > > org.apache.hbase:hbase-thrift:jar:1.1.2.2.3.2.0-2950 (compile),
>
> > > > org.apache.hbase:hbase-testing-util:jar:1.1.2.2.3.2.0-2950 (test),
>
> > > > org.apache.thrift:libthrift:jar:0.9.1 (compile),
>
> > > > commons-logging:commons-logging:jar:1.1.3 (compile),
>
> > > > org.apache.zookeeper:zookeeper:jar:3.4.6 (compile),
>
> > > > com.google.protobuf:protobuf-java:jar:2.5.0 (compile),
>
> > > > org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.1.2.3.2.0-2
>
> > > > 950 (compile),
>
> > > > org.apache.hadoop:hadoop-common:jar:2.7.1.2.3.2.0-2950
>
> > > > (compile)]: Failed to read artifact descriptor for
>
> > > > org.apache.hbase:hbase-common:jar:1.1.2.2.3.2.0-2950: Could not
>
> > > > transfer artifact
>
> > > > org.apache.hbase:hbase-common:pom:1.1.2.2.3.2.0-2950
>
> > > > from/to HDPReleases
>
> > > > (http://repo.hortonworks.com/content/repositories/releases/):
>
> > > > Failed to transfer file:
>
> > > > http://repo.hortonworks.com/content/repositories/releases/org/apac
>
> > > > he/h
>
> > > > base/
>
> > > > hbase-common/1.1.2.2.3.2.0-2950/hbase-common-1.1.2.2.3.2.0-2950.pom.
>
> > > > Return code is: 500 , ReasonPhrase:Server Error. -> [Help 1]
>
> > >  ##(SQF)
>
> > > >
>
> > > > [ERROR]         ##(SQF)
>
> > > >
>
> > > > [ERROR] To see the full stack trace of the errors, re-run Maven
>
> > > > with the -e
>
> > > > switch.     ##(SQF)
>
> > > >
>
> > > > [ERROR] Re-run Maven using the -X switch to enable full debug
> logging.
>
> > > > ##(SQF)
>
> > > >
>
> > > > [ERROR]         ##(SQF)
>
> > > >
>
> > > > [ERROR] For more information about the errors and possible solutions,
>
> > > > please read the following articles:       ##(SQF)
>
> > > >
>
> > > > [ERROR] [Help 1]
>
> > > > http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolut
>
> > > > ionE
>
> > > > xc
>
> > > > eption
>
> > > > ##(SQF)
>
> > > >
>
> > > > make[3]: *** [jdk_1_7_hdp] Error 1      ##(SQF)
>
> > > >
>
> > > > make[3]: Leaving directory
>
> > > > `/home/jenkins/workspace/build-master-
>
> > > > debug/trafodion/core/sqf/src/seatrans/hbase-trx'
>
> > > > ##(SQF)
>
> > > >
>
> > > > make[2]: *** [cp_trx_jar] Error 2       ##(SQF)
>
> > > >
>
> > > > make[2]: Leaving directory
>
> > > >
> `/home/jenkins/workspace/build-master-debug/trafodion/core/sqf/src/tm'
>
> > > > ##(SQF)
>
> > > >
>
> > > > make[1]: *** [tm] Error 2
>
> > > >
>
> > > > make[1]: *** Waiting for unfinished jobs....
>
> > > >
>
> > > > ude/log4cxx/log4cxx -c -o Linux-x86_64/64/dbg/reqqueue.o
>
> > > > reqqueue.cxx
>
> > > > ##(SQF)
>
> > > >
>
> > > > Finished building target: Linux-x86_64/64/dbg/reqqueue.o
> ##(SQF)
>
> > > >
>
> > > >         ##(SQF)
>
> > > >
>
> > > > Building target: Linux-x86_64/64/dbg/reqattstartup.o    ##(SQF)
>
> > > >
>
> > > > Invoking: C++ Compiler  ##(SQF)
>
> > > >
>
> > > > /opt/traf/tools/dest-mpich-3.0.4/bin/mpicxx
>
> > > > -Wp,-MD,Linux-x86_64/64/dbg/depend/reqattstartup.cxx.dep
>
> > > > -Wp,-MT,Linux-x86_64/64/dbg/reqattstartup.o -Wno-long-long
>
> > > > -fmessage-length=0 -g -Wno-deprecated -fmessage-length=0 -DDMALLOC
>
> > > > -DUSE_MON_LOGGING  -D_MPICC_H -UNDEBUG -Wall -Wextra -DMON_DEBUG
>
> > > > -DUSE_TESTPOINTS
>
> > > > -I/home/jenkins/workspace/build-master-
>
> > > > debug/trafodion/core/sqf/export/include
>
> > > > -I -I../../inc -I../../commonLogger -I/usr/include/log4cxx
>
> > > > -I/usr/include/log4cxx/log4cxx  -c -o
>
> > > > Linux-x86_64/64/dbg/reqattstartup.o
>
> > > > reqattstartup.cxx        ##(SQF)
>
> > > >
>
> > > > Finished building target: Linux-x86_64/64/dbg/reqattstartup.o
> ##(SQF)
>
> > > >
>
> > > >         ##(SQF)
>
> > > >
>
> > > > -----Original Message-----
>
> > > > From: Anoop Sharma [mailto:anoop.sharma@esgyn.com
> <anoop.sharma@esgyn.com>
>
> > > > <anoop.sharma@esgyn.com>]
>
> > > > Sent: Monday, April 4, 2016 7:19 AM
>
> > > > To: dev@trafodion.incubator.apache.org
>
> > > > Subject: RE: Trafodion master Daily Test Result - 165
>
> > > >
>
> > > > tests have not been running since saturday.
>
> > > >
>
> > > > Whats causing it?
>
> > > >
>
> > > >
>
> > > >
>
> > > > -----Original Message-----
>
> > > >
>
> > > > From: steve.varnau@esgyn.com [mailto:steve.varnau@esgyn.com
> <steve.varnau@esgyn.com>
>
> > > > <steve.varnau@esgyn.com>]
>
> > > >
>
> > > > Sent: Monday, April 4, 2016 1:44 AM
>
> > > >
>
> > > > To: dev@trafodion.incubator.apache.org
>
> > > >
>
> > > > Subject: Trafodion master Daily Test Result - 165
>
> > > >
>
> > > > Daily Automated Testing master
>
> > > >
>
> > > > Jenkins Job:   https://jenkins.esgyn.com/job/Check-Daily-master/165/
>
> > > >
>
> > > > Archived Logs: http://traf-testlogs.esgyn.com/Daily-master/165
>
> > > >
>
> > > > Bld Downloads: http://traf-builds.esgyn.com
>
> > > >
>
> > > > Changes since previous daily build:
>
> > > >
>
> > > > No changes
>
> > > >
>
> > > >
>
> > > >
>
> > > > Test Job Results:
>
> > > >
>
> > > > FAILURE build-master-debug (2 min 46 sec) FAILURE
>
> > > > build-master-release
>
> > > > (4 min 13 sec) FAILURE core-regress-charsets-cdh (1 min 36 sec)
>
> > > > FAILURE core-regress-charsets-hdp (8 min 20 sec) FAILURE
>
> > > > core-regress-compGeneral-cdh (1 min 40 sec) FAILURE
>
> > > > core-regress-compGeneral-hdp (8 min 45 sec) FAILURE
>
> > > > core-regress-core-cdh
>
> > > > (1 min 37 sec) FAILURE core-regress-core-hdp (1 min 26 sec)
>
> > > > FAILURE core-regress-executor-cdh (1 min 35 sec) FAILURE
>
> > > > core-regress-executor-hdp
>
> > > > (8 min 48 sec) FAILURE core-regress-fullstack2-cdh (6 min 1 sec)
>
> > > > FAILURE core-regress-fullstack2-hdp (9 min 10 sec) FAILURE
>
> > > > core-regress-hive-cdh
>
> > > > (1
>
> > > > min 38 sec) FAILURE core-regress-hive-hdp (9 min 19 sec) FAILURE
>
> > > > core-regress-privs1-cdh (1 min 36 sec) FAILURE
>
> > > > core-regress-privs1-hdp
>
> > > > (7 min 17 sec) FAILURE core-regress-privs2-cdh (1 min 38 sec)
>
> > > > FAILURE core-regress-privs2-hdp (1 min 33 sec) FAILURE
>
> > > > core-regress-qat-cdh (1 min
>
> > > > 37 sec) FAILURE core-regress-qat-hdp (1 min 33 sec) FAILURE
>
> > > > core-regress-seabase-cdh (6 min 1 sec) FAILURE
>
> > > > core-regress-seabase-hdp (9 min 14 sec) FAILURE
>
> > > > core-regress-udr-cdh
>
> > > > (1 min 38 sec) FAILURE core-regress-udr-hdp (9 min 29 sec) FAILURE
>
> > > > jdbc_test-cdh (7 min 59 sec) FAILURE jdbc_test-hdp (1 min 26 sec)
>
> > > > FAILURE phoenix_part1_T2-cdh (1 min
>
> > > > 37
>
> > > > sec) FAILURE phoenix_part1_T2-hdp (1 min 33 sec) FAILURE
>
> > > > phoenix_part1_T4-cdh (8 min 2 sec) FAILURE phoenix_part1_T4-hdp (1
>
> > > > min
>
> > > > 32
>
> > > > sec) FAILURE phoenix_part2_T2-cdh (1 min 36 sec) FAILURE
>
> > > > phoenix_part2_T2-hdp (1 min 33 sec) FAILURE phoenix_part2_T4-cdh
>
> > > > (5 min 41
>
> > > > sec) FAILURE phoenix_part2_T4-hdp (1 min 21 sec) FAILURE
>
> > > > pyodbc_test-cdh
>
> > > > (5
>
> > > > min 44 sec) FAILURE pyodbc_test-hdp (1 min 34 sec)
>
> > >
>
> >
>
> >
>
> >
>
> > --
>
> > Regards, --Qifan
>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message