trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Owhadi <>
Subject RE: I am puzzled with a jenkins test failure on CORE/TESTRTS
Date Thu, 21 Jan 2016 21:38:22 GMT
Issue is on predicatePushdownV2 branch derived from trafodion master branch.
Does that mean that we can ignore the failure and accept the merge in master
on PR255?

-----Original Message-----
From: Sandhya Sundaresan []
Sent: Thursday, January 21, 2016 3:23 PM
Subject: RE: I am puzzled with a jenkins test failure on CORE/TESTRTS

There were 2 instances earlier (see attached) and Selva had requested we set
ABORT_ON_ERROR=8926 be set in the Jenkins environment so he could debug if
the issue happened again. Steve mentioned the ABORT_ON_ERROR is still on.
Hence we see the core file. I know Selva had problems figuring out why this
was happening but he may get more info from the core file now.

What was the issue on a different branch ? Was it on AdvEnt branch ?


-----Original Message-----
From: Anoop Sharma []
Sent: Thursday, January 21, 2016 1:13 PM
Subject: RE: I am puzzled with a jenkins test failure on CORE/TESTRTS

 RE: I am puzzled with a jenkins test failure on CORE/TESTRTS

Looks like some problem with runtime stats (RTS). It is missing.

There was an issue related to missing RTS area on another branch.

Not sure if that has been merged in here. We can check that.

Selva can comment some more but he is on a plane right now.

   >>error 8926;

      *** SQLSTATE (Err): X08PQ SQLSTATE (Warn): 01KPQ

      *** ERROR[8926] The given SQLSTATS_DESC_STATS_TYPE is not found in
      the merged runtime statistics.


-----Original Message-----
From: Eric Owhadi [ <>]
Sent: Thursday, January 21, 2016 1:02 PM
Subject: I am puzzled with a jenkins test failure on CORE/TESTRTS

Dear Trafodioneers,

The last 2 days  I have been hunting with the great help of Steve Arnaud, a
weird failure blocking the merge of PR 255 (predicate pushdown V2).


13 days ago, PR 255 was passing Jenkins.

6 days ago, after applying changes related to code review on PR 255 and
synching to latest master, Jenkins started failing the core/TESTRTS with a
core dump.

So I created a fake PR on a new branch, with the initial code from 13 days
ago, and merged it with latest master -> Jenkins fails with same error.

Demonstrating that the PR rework was not at root cause of this sudden wrong

So the issue is a combination of my new code (initial or after rework) with
some changes in master that happened between 13 days ago and 6 days ago.

This failure does not happen on dev environment (tested both in debug and
release mode).

Steve was able to duplicate it on a jenkins server, and narrowed down the
condition for its apparition to the sequence of core/TEST005 followed by

Without TEST005 as catalyst, the issue does not manifest in the Jenkins
server ether.

The stack trace at time of explosion is not very helpful and shows (in red

8926 is the const value of EXE_STAT_NOT_FOUND):

EXE_STAT_NOT_FOUND can come from 34 different code path, and unfortunately,
the structure of the code does not help narrowing down witch one of the 34
was crossed at time of death with stack trace analysis. -> I hate Murphy…

Thread 1 (Thread 0x7f26080e23c0 (LWP 1505)):

#0  0x00007f2605260625 in raise () from /lib64/

#1  0x00007f2605261d8d in abort () from /lib64/

#2  0x00007f2604d39494 in ComCondition::setSQLCODE (this=<value optimized

out>, newSQLCODE=-8926) at ../export/ComDiags.cpp:1428

#3  0x00007f2603911c56 in ExHandleErrors (qparent=..., down_entry=<value
optimized out>, matchNo=<value optimized out>, globals=<value optimized

out>, diags_in=<value optimized out>, err=4294958370, intParam1=0x0,

stringParam1=0x0, nskErr=0x0, stringParam2=0x0) at


#4  0x00007f2603a01e26 in ExExeUtilGetRTSStatisticsTcb::work

(this=0x7f25f39eec58) at ../executor/ExExeUtilGetStats.cpp:4222

#5  0x00007f2603a5ee33 in ExScheduler::work (this=0x7f25f39ee7c0,
prevWaitTime=<value optimized out>) at ../executor/ExScheduler.cpp:331

#6  0x00007f2603973752 in ex_root_tcb::execute (this=0x7f25f39f4c50,
cliGlobals=0x2b70120, glob=0x7f25f39b2ca8, input_desc=0x7f25f39aa030,
diagsArea=@0x7ffd2e100750, reExecute=0) at ../executor/ex_root.cpp:1058

#7  0x00007f2604fe7654 in CliStatement::execute (this=0x7f25f39c0ea0,
cliGlobals=0x2b70120, input_desc=0x7f25f39aa030, diagsArea=<value optimized

out>, execute_state=<value optimized out>, fixupOnly=0, cliflags=0) at


#8  0x00007f2604f892ac in SQLCLI_PerformTasks(CliGlobals *, ULng32,
SQLSTMT_ID *, SQLDESC_ID *, SQLDESC_ID *, Lng32, Lng32, typedef
__va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *, SQLCLI_PTR_PAIRS *)
(cliGlobals=0x2b70120, tasks=4882, statement_id=0x3398010,
input_descriptor=0x34c0eb0, output_descriptor=0x0, num_input_ptr_pairs=0,
num_output_ptr_pairs=0, ap=0x7ffd2e1008f0, input_ptr_pairs=0x0,

output_ptr_pairs=0x0) at ../cli/Cli.cpp:3297

#9  0x00007f2604f89fe2 in SQLCLI_Exec(CliGlobals *, SQLSTMT_ID *, SQLDESC_ID
*, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *)
(cliGlobals=<value optimized out>, statement_id=<value optimized out>,
input_descriptor=<value optimized out>, num_ptr_pairs=<value optimized out>,
ap=<value optimized out>, ptr_pairs=<value optimized out>) at

#10 0x00007f2604ff588b in SQL_EXEC_Exec (statement_id=0x3398010,
input_descriptor=0x34c0eb0, num_ptr_pairs=0) at ../cli/CliExtern.cpp:2074

#11 0x00007f26078bb99b in SqlCmd::doExec (sqlci_env=0x2b58c50,
stmt=0x3398010, prep_stmt=<value optimized out>, numUnnamedParams=<value
optimized out>, unnamedParamArray=<value optimized out>,
unnamedParamCharSetArray=<value optimized out>, handleError=1) at


#12 0x00007f26078bc392 in SqlCmd::do_execute (sqlci_env=0x2b58c50,
prep_stmt=0x2cefd50, numUnnamedParams=0, unnamedParamArray=0x0,
unnamedParamCharSetArray=0x0, prepcode=0) at ../sqlci/SqlCmd.cpp:2122

#13 0x00007f26078bcabd in DML::process (this=0x2cf0080,

sqlci_env=0x2b58c50) at ../sqlci/SqlCmd.cpp:2897

#14 0x00007f26078a2844 in Obey::process (this=0x2ceff60, sqlci_env=<value
optimized out>) at ../sqlci/Obey.cpp:267

#15 0x00007f26078a2844 in Obey::process (this=0x385d960, sqlci_env=<value
optimized out>) at ../sqlci/Obey.cpp:267

#16 0x00007f26078a2844 in Obey::process (this=0x4662980, sqlci_env=<value
optimized out>) at ../sqlci/Obey.cpp:267

#17 0x00007f26078ab074 in SqlciEnv::run (this=0x2b58c50, in_filename=<value
optimized out>, input_string=<value optimized out>) at


#18 0x00000000004019d2 in main (argc=2, argv=0x7ffd2e102578) at


So if anyone have a smart idea on what could be causing this, I am very
interested. I am now going to see if debugging from the Jenkins box can give
any better insight.

BTW, I was also going to run a fake PR on just latest master without any of
the PR255 code, but Steve correctly highlighted that other PRs are ongoing
and not failing that test… so this test case is useless…



View raw message