qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gordon Sim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DISPATCH-994) segfault in qdr_link_second_attach
Date Thu, 10 May 2018 21:40:00 GMT

    [ https://issues.apache.org/jira/browse/DISPATCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471157#comment-16471157

Gordon Sim commented on DISPATCH-994:

Digging a little deeper, the issue here is the re-use of a link name on a session before the
previous use has fully closed. The test case attached here is arguably incorrect, as it does
not wait for the connection to be confirmed as closed before resubscribing with the same link
names. However even a modified version that does so can cause the same problem. DISPATCH-997
is a different symptom of the same route cause, and the test there also waits for connection
close before reusing. If router-c is run under valrgind, that too can trigger this segfault.

The only way to avoid it would be for the application to wait for the link detach to be confirmed
before closing the connection. That is not something that can be relied on. If the connection
ends (cleanly or due to disconnect) before the link is closed, then the router will confirm
the close of the connection before waiting for the detach it relays down the link route to
be echoed back.

If you get an attach with the same name before the detach for the previous use of that name
has been echoed back, then the previous link is not fully closed (it is locally open, remotely
closed), and when proton handles the attach it gives the previous object which is in the incorrect
state. This either leads to the router incorrectly treating the attach as the echoing back
of a router initiated link, which causes the segfault described in this issue due to correct
context not being set up or it causes the attach to be ignored and not echoed back. The former
happens when the detach is echoed back slowly, so running router c under valgrind makes it
more likely.

Fundamentally I think this is an issue in using the same session for all routed links, where
the links are detached asynchronously.

> segfault in qdr_link_second_attach
> ----------------------------------
>                 Key: DISPATCH-994
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-994
>             Project: Qpid Dispatch
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Gordon Sim
>            Priority: Major
>         Attachments: router-a.conf, router-b.conf, router-c.conf, topic_test.py
> Link routing from router A through router B to a 'broker', and closing and opening two
receivers causes a segfault.
> {noformat}
> ==25674== Thread 4:
> ==25674== Invalid read of size 8
> ==25674==    at 0x4E77EEF: qdr_link_second_attach (connections.c:474)
> ==25674==    by 0x4E87142: AMQP_link_attach_handler (router_node.c:680)
> ==25674==    by 0x4E8BF2B: handle (server.c:940)
> ==25674==    by 0x4E8CBA7: thread_run (server.c:958)
> ==25674==    by 0x54FA739: start_thread (in /usr/lib64/libpthread-2.24.so)
> ==25674==    by 0x6288E7E: clone (in /usr/lib64/libc-2.24.so)
> ==25674==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
> ==25674== 
> ==25674== 
> ==25674== Process terminating with default action of signal 11 (SIGSEGV): dumping core
> ==25674==  Access not within mapped region at address 0x10
> ==25674==    at 0x4E77EEF: qdr_link_second_attach (connections.c:474)
> ==25674==    by 0x4E87142: AMQP_link_attach_handler (router_node.c:680)
> ==25674==    by 0x4E8BF2B: handle (server.c:940)
> ==25674==    by 0x4E8CBA7: thread_run (server.c:958)
> ==25674==    by 0x54FA739: start_thread (in /usr/lib64/libpthread-2.24.so)
> ==25674==    by 0x6288E7E: clone (in /usr/lib64/libc-2.24.so)
> ==25674==  If you believe this happened as a result of a stack
> ==25674==  overflow in your program's main thread (unlikely but
> ==25674==  possible), you can try to increase the size of the
> ==25674==  main thread stack using the --main-stacksize= flag.
> ==25674==  The main thread stack size used in this run was 8388608
> {noformat}
> To reproduce, start three routers with the attached config files, then run the attached
python test program.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message