trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radu Marias <radumar...@gmail.com>
Subject trafodion won't start core files are generated
Date Tue, 06 Oct 2015 16:21:24 GMT
Hi,

At some point a node from the 5 nodes cluster has stopped and we needed to
restart it, After that I've restarted all the ambari and hdp services but
trafodion fails to start.

Bellow are some stack traces and details for files that I'm not getting any
stack. Files are from node1 and node2 and were in Oct  2 (when I think node
2 was down) and Oct  6 (when re rebooted the node and tried to start
trafodion). Feel free to connect and debug the issue on our cluster, Amanda
has the credentials.

*FROM NODE1*

Oct  2 22:27 core.39347
core.39347: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'tm SQMON1.1 00000 00000 039347 $TM0 188.138.61.175:60186 00002 00000
00009 SPAR'
gdb /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv
core.39347
no stack

Oct  2 22:41 core.15144
Program terminated with signal 6, Aborted.
#0  0x00007f77bcbbb625 in ?? ()
#1  0x00007f77bcbbce05 in ?? ()
#2  0x0000000000000010 in ?? () at ../common/Collections.cpp:109
#3  0x00007f77bee62130 in ?? ()
#4  0x00007ffe8e796ec0 in ?? ()
#5  0x00007f77bdeced00 in ?? ()
#6  0x0000000000000004 in ?? () at ../common/Collections.cpp:109
#7  0x0000000001b3a310 in ?? ()
#8  0x0000000000000000 in ?? ()

Oct  2 22:41 core.39240
#0  0x00007f534d03c625 in raise () from /lib64/libc.so.6
#1  0x00007f534d03de05 in abort () from /lib64/libc.so.6
#2  0x00007f534d03574e in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f534d035810 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000046e213 in CExtTmLeaderReq::performRequest
(this=0x7f53340008c0) at reqtmleader.cxx:126
#5  0x000000000045a64a in CReqWorker::reqWorkerThread (this=<value
optimized out>) at reqworker.cxx:79
#6  0x000000000045a86d in reqWorker (arg=0xc6f9a0) at reqworker.cxx:147
#7  0x00007f534db45a51 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f534d0f29ad in clone () from /lib64/libc.so.6

Oct  2 22:41 core.15309
core.15309: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'tm SQMON1.1 00000 00000 015309 $TM0 188.138.61.175:60186 00002 00000
00134 SPAR'
gdb /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv
core.15309
no stack


*FROM NODE2*

Oct  2 22:29 core.39491
core.39491: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'tm SQMON1.1 00001 00001 039491 $TM1 188.138.61.177:38680 00002 00001
00003 SPAR'
gdb /home/trafodion/trafodion-20150828_0830/export/bin64/tdm_udrserv
core.39491
no stack

Oct  6 15:23 core.1394
Program terminated with signal 6, Aborted.
#0  0x00007fb97acbf625 in raise () from /lib64/libc.so.6
#1  0x00007fb97acc0e05 in abort () from /lib64/libc.so.6
#2  0x000000000041d07d in CProcessContainer::CProcessContainer
(this=0x2071880, nodeContainer=<value optimized out>) at process.cxx:3366
#3  0x0000000000453f5c in CNode::CNode (this=0x2071880, name=0x204c448
"euve79672", pnid=0, rank=0) at pnode.cxx:153
#4  0x00000000004558e0 in CNodeContainer::AddNodes (this=<value optimized
out>) at pnode.cxx:1564
#5  0x00000000004169a5 in CCluster::InitializeConfigCluster
(this=0x20757b0) at cluster.cxx:2740
#6  0x0000000000417645 in CCluster::CCluster (this=0x20757b0) at
cluster.cxx:567
#7  0x0000000000431e1a in CTmSync_Container::CTmSync_Container
(this=0x20757b0) at tmsync.cxx:137
#8  0x0000000000407bb6 in CMonitor::CMonitor (this=0x20757b0,
procTermSig=9) at monitor.cxx:323
#9  0x00000000004086ad in main (argc=2, argv=0x7fff8322e298) at
monitor.cxx:1152

Oct  6 15:43 core.17626
Program terminated with signal 6, Aborted.
#0  0x00007fcf11aea625 in raise () from /lib64/libc.so.6
#1  0x00007fcf11aebe05 in abort () from /lib64/libc.so.6
#2  0x000000000041d07d in CProcessContainer::CProcessContainer
(this=0x1182890, nodeContainer=<value optimized out>) at process.cxx:3366
#3  0x0000000000453f5c in CNode::CNode (this=0x1182890, name=0x115d458
"euve79672", pnid=0, rank=0) at pnode.cxx:153
#4  0x00000000004558e0 in CNodeContainer::AddNodes (this=<value optimized
out>) at pnode.cxx:1564
#5  0x00000000004169a5 in CCluster::InitializeConfigCluster
(this=0x11867c0) at cluster.cxx:2740
#6  0x0000000000417645 in CCluster::CCluster (this=0x11867c0) at
cluster.cxx:567
#7  0x0000000000431e1a in CTmSync_Container::CTmSync_Container
(this=0x11867c0) at tmsync.cxx:137
#8  0x0000000000407bb6 in CMonitor::CMonitor (this=0x11867c0,
procTermSig=9) at monitor.cxx:323
#9  0x00000000004086ad in main (argc=2, argv=0x7ffcaca91f68) at
monitor.cxx:1152

-- 
And in the end, it's not the years in your life that count. It's the life
in your years.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message