hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kuien Liu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died
Date Tue, 19 Sep 2017 09:39:00 GMT
Kuien Liu created HAWQ-1529:

             Summary: "segment resource manager" will NOT exit when postmaster died
                 Key: HAWQ-1529
                 URL: https://issues.apache.org/jira/browse/HAWQ-1529
             Project: Apache HAWQ
          Issue Type: Improvement
          Components: Core
            Reporter: Kuien Liu
            Assignee: Radar Lei

If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster dies, BUT "segment
resource manager" and "logger process" are still alive and flushing "WARNING" each 30s.

To my understanding, "logger process" is waiting for "segment resource manager", but the resource
manager will not detect the alive-status of postmaster and continue waiting. Does it make
sense? Why not quit in case of postmaster gone? 

The call stack of RM when postmaster is killed:
#0  0x00007f19023ccab6 in poll () from /lib64/libc.so.6
#1  0x0000000000a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156
#2  0x0000000000a8ce5e in MainHandlerLoop_RMSEG () at resourcemanager_RMSEG.c:166
#3  0x0000000000a8cba3 in ResManagerMainSegment2ndPhase () at resourcemanager_RMSEG.c:71
#4  0x0000000000a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at resourcemanager.c:346
#5  0x0000000000a8db45 in ResManagerProcessStartup () at resourcemanager.c:411
#6  0x0000000000899b89 in CommenceNormalOperations () at postmaster.c:3673
#7  0x000000000089a562 in do_reaper () at postmaster.c:4021
#8  0x00000000008969bb in ServerLoop () at postmaster.c:2136
#9  0x0000000000895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at postmaster.c:1454
#10 0x00000000007b185d in main (argc=0xc, argv=0x229a730) at main.c:226
#11 0x00007f190231e994 in __libc_start_main () from /lib64/libc.so.6
#12 0x00000000004bde89 in _start ()

This message was sent by Atlassian JIRA

View raw message