trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandhya Sundaresan <sandhya.sundare...@esgyn.com>
Subject RE: Non deteminism in tests (Trafodion master Daily Test Result - 165)
Date Mon, 04 Apr 2016 19:12:52 GMT
 RE: Non deteminism in tests (Trafodion master Daily Test Result - 165)

For Option 2, we could have an option in Jenkins so that a committer  could
add a comment like "jenkins, run full tests" that would kick off the full
test run.

For  a small change we may not need to run full tests and committers could
decide that.

Sandhya

-----Original Message-----
From: Sandhya Sundaresan [mailto:sandhya.sundaresan@esgyn.com
<sandhya.sundaresan@esgyn.com>]
Sent: Monday, April 4, 2016 12:05 PM
To: 'dev@trafodion.incubator.apache.org' <dev@trafodion.incubator.apache.org
>
Subject: RE: Non deteminism in tests (Trafodion master Daily Test Result -
165)

Today,  when this daily email comes out from Steve, I think only  couple of
us are paying attention to the results. We try to get after the folks who
checked in and try to see if their checkin caused the failures.

Clearly this adhoc process isn't going to work well.

So 2 options to get past this :

1.  Every pull request should kick off all the tests - not adjust the small
core set.

To avoid long queues, we could build some intelligence into the system that
kicks off tests to club 3 or 4 PRs that come within  the hour and run one
full test run for all 3 combined. Unless we automate this full regression
testing  , whatever Han's lists below will continue to be a problem. (Do we
have the test machine resources to do this ?)

2. The other option is for Trafodion committers to NOT commit/merge the PR
until the results of the entire SQL  regression  and/or Phoenix tests run
have been posted as a comment in the PR .

Thanks

Sandhya

-----Original Message-----

From: Hans Zeller [mailto:hans.zeller@esgyn.com <hans.zeller@esgyn.com>]

Sent: Monday, April 4, 2016 10:59 AM

To: dev <dev@trafodion.incubator.apache.org>

Subject: Re: Trafodion master Daily Test Result - 165

Don't know about you, but seeing tests fail every single day makes me kind
of indifferent to these test failures, after a few hundred of them... I
really wish we could have an environment where the daily email we get is at
least 8 out of 10 times a clear indicator whether a build is good or not.

In other words, all tests pass for a good build, or tests fail for a bad
build. Right now, every single day we see failures, so what does that mean?

A few things we could consider:

   - Categorize. IMHO that's one of the key ways to deal with errors:

      - Deterministic issues:

         - Failure to run relevant tests - update to an expected file

         missing.

         - Deterministic bugs introduced.

      - Non-deterministic bugs - those are much harder to deal with:

         - Non-deterministic issues in our code.

         - Instability of the underlying platform.

      - Document: Make sure we have JIRAs for all issues that affect

   regression test failures.

   - Communicate what and who causes test failures. A side-effect of the

   previous bullet. Most of us break the build once in a while, but we
should

   try not to do it too often.

   - A few additional things:

      - Some of our tests are poorly designed, leading to a lot of false

      failures. Usually because they try to test too much.

      - Some of our tests cause failures when run twice on the development

      platform, usually due to missing cleanup.

This would take some effort. On the other hand, having a clear pass/fail
indication for a build saves us all a lot of time.

Hans

On Mon, Apr 4, 2016 at 10:49 AM, Qifan Chen <qifan.chen@esgyn.com> wrote:

> It probably will be too time consuming to test 3 or <n> flavors by

> developers in general.

>

> Would it be possible to use a default flavor out of box (i.e.,

> configured during a local install), or select a particular flavor if
there is a need?

>

> Thanks --Qifan

>

> On Mon, Apr 4, 2016 at 11:52 AM, Sandhya Sundaresan <

> sandhya.sundaresan@esgyn.com> wrote:

>

> > Hi Steve,

> >  I was not suggesting taking any out of the build. I was suggesting

> > that all

> > 3 flavors get run nightly in the official build that you kick off

> > -the

> same

> > way full regressions are done only nightly.

> > But if building all 3 flavors doesn't add too much to each

> > developer's build time, I think it's fine to build all 3 flavors. It

> > will certainly be

> easier

> > to isolate build issues individually.

> >

> > Sandhya

> >

> >

> > -----Original Message-----

> > From: Steve Varnau [mailto:steve.varnau@esgyn.com
<steve.varnau@esgyn.com>]

> > Sent: Monday, April 4, 2016 9:46 AM

> > To: dev@trafodion.incubator.apache.org

> > Subject: RE: Trafodion master Daily Test Result - 165

> >

> > It would be fine for developers to comment out one of the flavors to

> > work around an external problem like this.

> >

> > In general, however, I think it is important that we maintain all

> > the current flavors of TRX in the build.  It is really nice that the

> > default build will work on any of those 3 distros.  If we take any

> > of them out of the build, the more likely they will be to break when
they are tried.

> >

> > --Steve

> >

> >

> > > -----Original Message-----

> > > From: Sandhya Sundaresan [mailto:sandhya.sundaresan@esgyn.com
<sandhya.sundaresan@esgyn.com>]

> > > Sent: Monday, April 4, 2016 9:38 AM

> > > To: dev@trafodion.incubator.apache.org

> > > Subject: RE: Trafodion master Daily Test Result - 165

> > >

> > > Sounds good.

> > > We could deliver this - can you do that Selva ?

> > >

> > > But the next question is that developers probably don't need to

> > > build all the versions everytime. Cdh, hdp and vanilla Apache.

> > > But the  nightly build  each night should probably build all 3

> > > versins to ensure they are working fine.

> > > Any comments ?

> > >

> > > Developers should still be abel to build any version by either a

> > > make option or an envvar setting though  but default could just be
cdh.

> > >

> > > Sandhya

> > >

> > >

> > > -----Original Message-----

> > > From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com
<selva.govindarajan@esgyn.com>]

> > > Sent: Monday, April 4, 2016 8:38 AM

> > > To: dev@trafodion.incubator.apache.org

> > > Subject: RE: Trafodion master Daily Test Result - 165

> > >

> > > RE: Trafodion master Daily Test Result - 165

> > >

> > > These changes seem to help to build successfully.  It looks like

> > > Trafodion become too specific to a particular build of HDP.

> > >

> > >

> > >

> > > Selva

> > >

> > >

> > >

> > > index 3295b30..5c0b5a0 100644

> > >

> > > --- a/core/sqf/sqenvcom.sh

> > >

> > > +++ b/core/sqf/sqenvcom.sh

> > >

> > > @@ -142,8 +142,8 @@ export SQ_HOME=$PWD

> > >

> > > # set common version to be consistent between shared lib and maven

> > > dependencies

> > >

> > > export HBASE_DEP_VER_CDH=1.0.0-cdh5.4.4

> > >

> > > export HIVE_DEP_VER_CDH=1.1.0-cdh5.4.4

> > >

> > > -export HBASE_DEP_VER_HDP=1.1.2.2.3.2.0-2950

> > >

> > > -export HIVE_DEP_VER_HDP=1.2.1.2.3.2.0-2950

> > >

> > > +export HBASE_DEP_VER_HDP=1.1.2

> > >

> > > +export HIVE_DEP_VER_HDP=1.2.1

> > >

> > > export HBASE_DEP_VER_APACHE=1.0.2

> > >

> > > export HIVE_DEP_VER_APACHE=1.1.0

> > >

> > > export HBASE_TRX_ID_CDH=hbase-trx-cdh5_4

> > >

> > > diff --git a/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp

> > > b/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp

> > >

> > > index 045772c..8592b58 100755

> > >

> > > --- a/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp

> > >

> > > +++ b/core/sqf/src/seatrans/hbase-trx/pom.xml.hdp

> > >

> > > @@ -49,7 +49,7 @@

> > >

> > >    </repositories>

> > >

> > >    <properties>

> > >

> > > -    <hadoop.version>2.7.1.2.3.2.0-2950</hadoop.version>

> > >

> > > +    <hadoop.version>2.7.1</hadoop.version>

> > >

> > >      <hbase.version>${env.HBASE_DEP_VER_HDP}</hbase.version>

> > >

> > >

> > > <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

> > >

> > >      <java.version>1.7</java.version>

> > >

> > >

> > >

> > > *From:* Sandhya Sundaresan [mailto:sandhya.sundaresan@esgyn.com
<sandhya.sundaresan@esgyn.com>]

> > > *Sent:* Monday, April 4, 2016 7:56 AM

> > > *To:* dev@trafodion.incubator.apache.org

> > > *Subject:* RE: Trafodion master Daily Test Result - 165

> > >

> > >

> > >

> > > We might need Ming's change (attached) checked in before we proceed .

> > > He said it’s a workaround but perhaps that is the fix ?  Need

> > > Hans, Prashanth to chime in here to verify.

> > >

> > > The build failed just like a few others said it did for them .  I

> > > saw this same error(below)  in the make.log for the Sat build and

> > > today's build.

> > >

> > > Sandhya

> > >

> > > <<...>>

> > >

> > > [ERROR] Failed to execute goal on project hbase-trx-hdp2_3: Could

> > > not resolve dependencies for project
org.apache:hbase-trx-hdp2_3:jar:2.0.0:

> > > Failed to collect dependencies for

> > > [org.apache.hbase:hbase-common:jar:1.1.2.2.3.2.0-2950 (compile),

> > > org.apache.hbase:hbase-protocol:jar:1.1.2.2.3.2.0-2950 (compile),

> > > org.apache.hbase:hbase-client:jar:1.1.2.2.3.2.0-2950 (compile),

> > > org.apache.hbase:hbase-server:jar:1.1.2.2.3.2.0-2950 (compile),

> > > org.apache.hbase:hbase-thrift:jar:1.1.2.2.3.2.0-2950 (compile),

> > > org.apache.hbase:hbase-testing-util:jar:1.1.2.2.3.2.0-2950 (test),

> > > org.apache.thrift:libthrift:jar:0.9.1 (compile),

> > > commons-logging:commons-logging:jar:1.1.3 (compile),

> > > org.apache.zookeeper:zookeeper:jar:3.4.6 (compile),

> > > com.google.protobuf:protobuf-java:jar:2.5.0 (compile),

> > > org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.1.2.3.2.0-2

> > > 950 (compile),

> > > org.apache.hadoop:hadoop-common:jar:2.7.1.2.3.2.0-2950

> > > (compile)]: Failed to read artifact descriptor for

> > > org.apache.hbase:hbase-common:jar:1.1.2.2.3.2.0-2950: Could not

> > > transfer artifact

> > > org.apache.hbase:hbase-common:pom:1.1.2.2.3.2.0-2950

> > > from/to HDPReleases

> > > (http://repo.hortonworks.com/content/repositories/releases/):

> > > Failed to transfer file:

> > > http://repo.hortonworks.com/content/repositories/releases/org/apac

> > > he/h

> > > base/

> > > hbase-common/1.1.2.2.3.2.0-2950/hbase-common-1.1.2.2.3.2.0-2950.pom.

> > > Return code is: 500 , ReasonPhrase:Server Error. -> [Help 1]

> >  ##(SQF)

> > >

> > > [ERROR]         ##(SQF)

> > >

> > > [ERROR] To see the full stack trace of the errors, re-run Maven

> > > with the -e

> > > switch.     ##(SQF)

> > >

> > > [ERROR] Re-run Maven using the -X switch to enable full debug logging.

> > > ##(SQF)

> > >

> > > [ERROR]         ##(SQF)

> > >

> > > [ERROR] For more information about the errors and possible solutions,

> > > please read the following articles:       ##(SQF)

> > >

> > > [ERROR] [Help 1]

> > > http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolut

> > > ionE

> > > xc

> > > eption

> > > ##(SQF)

> > >

> > > make[3]: *** [jdk_1_7_hdp] Error 1      ##(SQF)

> > >

> > > make[3]: Leaving directory

> > > `/home/jenkins/workspace/build-master-

> > > debug/trafodion/core/sqf/src/seatrans/hbase-trx'

> > > ##(SQF)

> > >

> > > make[2]: *** [cp_trx_jar] Error 2       ##(SQF)

> > >

> > > make[2]: Leaving directory

> > > `/home/jenkins/workspace/build-master-debug/trafodion/core/sqf/src/tm'

> > > ##(SQF)

> > >

> > > make[1]: *** [tm] Error 2

> > >

> > > make[1]: *** Waiting for unfinished jobs....

> > >

> > > ude/log4cxx/log4cxx -c -o Linux-x86_64/64/dbg/reqqueue.o

> > > reqqueue.cxx

> > > ##(SQF)

> > >

> > > Finished building target: Linux-x86_64/64/dbg/reqqueue.o
##(SQF)

> > >

> > >         ##(SQF)

> > >

> > > Building target: Linux-x86_64/64/dbg/reqattstartup.o    ##(SQF)

> > >

> > > Invoking: C++ Compiler  ##(SQF)

> > >

> > > /opt/traf/tools/dest-mpich-3.0.4/bin/mpicxx

> > > -Wp,-MD,Linux-x86_64/64/dbg/depend/reqattstartup.cxx.dep

> > > -Wp,-MT,Linux-x86_64/64/dbg/reqattstartup.o -Wno-long-long

> > > -fmessage-length=0 -g -Wno-deprecated -fmessage-length=0 -DDMALLOC

> > > -DUSE_MON_LOGGING  -D_MPICC_H -UNDEBUG -Wall -Wextra -DMON_DEBUG

> > > -DUSE_TESTPOINTS

> > > -I/home/jenkins/workspace/build-master-

> > > debug/trafodion/core/sqf/export/include

> > > -I -I../../inc -I../../commonLogger -I/usr/include/log4cxx

> > > -I/usr/include/log4cxx/log4cxx  -c -o

> > > Linux-x86_64/64/dbg/reqattstartup.o

> > > reqattstartup.cxx        ##(SQF)

> > >

> > > Finished building target: Linux-x86_64/64/dbg/reqattstartup.o
##(SQF)

> > >

> > >         ##(SQF)

> > >

> > > -----Original Message-----

> > > From: Anoop Sharma [mailto:anoop.sharma@esgyn.com
<anoop.sharma@esgyn.com>

> > > <anoop.sharma@esgyn.com>]

> > > Sent: Monday, April 4, 2016 7:19 AM

> > > To: dev@trafodion.incubator.apache.org

> > > Subject: RE: Trafodion master Daily Test Result - 165

> > >

> > > tests have not been running since saturday.

> > >

> > > Whats causing it?

> > >

> > >

> > >

> > > -----Original Message-----

> > >

> > > From: steve.varnau@esgyn.com [mailto:steve.varnau@esgyn.com
<steve.varnau@esgyn.com>

> > > <steve.varnau@esgyn.com>]

> > >

> > > Sent: Monday, April 4, 2016 1:44 AM

> > >

> > > To: dev@trafodion.incubator.apache.org

> > >

> > > Subject: Trafodion master Daily Test Result - 165

> > >

> > > Daily Automated Testing master

> > >

> > > Jenkins Job:   https://jenkins.esgyn.com/job/Check-Daily-master/165/

> > >

> > > Archived Logs: http://traf-testlogs.esgyn.com/Daily-master/165

> > >

> > > Bld Downloads: http://traf-builds.esgyn.com

> > >

> > > Changes since previous daily build:

> > >

> > > No changes

> > >

> > >

> > >

> > > Test Job Results:

> > >

> > > FAILURE build-master-debug (2 min 46 sec) FAILURE

> > > build-master-release

> > > (4 min 13 sec) FAILURE core-regress-charsets-cdh (1 min 36 sec)

> > > FAILURE core-regress-charsets-hdp (8 min 20 sec) FAILURE

> > > core-regress-compGeneral-cdh (1 min 40 sec) FAILURE

> > > core-regress-compGeneral-hdp (8 min 45 sec) FAILURE

> > > core-regress-core-cdh

> > > (1 min 37 sec) FAILURE core-regress-core-hdp (1 min 26 sec)

> > > FAILURE core-regress-executor-cdh (1 min 35 sec) FAILURE

> > > core-regress-executor-hdp

> > > (8 min 48 sec) FAILURE core-regress-fullstack2-cdh (6 min 1 sec)

> > > FAILURE core-regress-fullstack2-hdp (9 min 10 sec) FAILURE

> > > core-regress-hive-cdh

> > > (1

> > > min 38 sec) FAILURE core-regress-hive-hdp (9 min 19 sec) FAILURE

> > > core-regress-privs1-cdh (1 min 36 sec) FAILURE

> > > core-regress-privs1-hdp

> > > (7 min 17 sec) FAILURE core-regress-privs2-cdh (1 min 38 sec)

> > > FAILURE core-regress-privs2-hdp (1 min 33 sec) FAILURE

> > > core-regress-qat-cdh (1 min

> > > 37 sec) FAILURE core-regress-qat-hdp (1 min 33 sec) FAILURE

> > > core-regress-seabase-cdh (6 min 1 sec) FAILURE

> > > core-regress-seabase-hdp (9 min 14 sec) FAILURE

> > > core-regress-udr-cdh

> > > (1 min 38 sec) FAILURE core-regress-udr-hdp (9 min 29 sec) FAILURE

> > > jdbc_test-cdh (7 min 59 sec) FAILURE jdbc_test-hdp (1 min 26 sec)

> > > FAILURE phoenix_part1_T2-cdh (1 min

> > > 37

> > > sec) FAILURE phoenix_part1_T2-hdp (1 min 33 sec) FAILURE

> > > phoenix_part1_T4-cdh (8 min 2 sec) FAILURE phoenix_part1_T4-hdp (1

> > > min

> > > 32

> > > sec) FAILURE phoenix_part2_T2-cdh (1 min 36 sec) FAILURE

> > > phoenix_part2_T2-hdp (1 min 33 sec) FAILURE phoenix_part2_T4-cdh

> > > (5 min 41

> > > sec) FAILURE phoenix_part2_T4-hdp (1 min 21 sec) FAILURE

> > > pyodbc_test-cdh

> > > (5

> > > min 44 sec) FAILURE pyodbc_test-hdp (1 min 34 sec)

> >

>

>

>

> --

> Regards, --Qifan

>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message