qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (PROTON-1727) [epoll proactor] segfaults, hangs and leaked FDs around failed connect
Date Fri, 22 Dec 2017 17:14:00 GMT

     [ https://issues.apache.org/jira/browse/PROTON-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Conway resolved PROTON-1727.
---------------------------------
    Resolution: Fixed

> [epoll proactor] segfaults, hangs and leaked FDs around failed connect
> ----------------------------------------------------------------------
>
>                 Key: PROTON-1727
>                 URL: https://issues.apache.org/jira/browse/PROTON-1727
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.18.1
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>            Priority: Blocker
>             Fix For: proton-c-0.20.0
>
>
> There is a race condition that causes leaked FDs and segfaults in the epoll proactor
under the following conditions:
> - there is more than one thread processing proactor events. 
> - attempting to connect to a host address that resolves to multiple socket addresses,
e.g. resolving the NULL hostname on a machine with ipv4 and ipv6 enabled.
> - there is nothing listening on the target port.
> The attached reproducer shows several bad behaviors:
> - under rr or valgrind (--tool=memcheck and --tool=helgrind) it quickly (< 1min) shows
race conditions and/or invalid memory access.
> - it hangs fairly often even without valgrind/rr, more so if you increase the thread
count. Without valgrind/rr it rarely segfaults.
> - it leaks FDs - the test should run forever, but runs out of FDs around 1024 iterations.
> This is probably the cause of https://issues.apache.org/jira/browse/DISPATCH-902, which
does occur very frequently under the conditions described there.
> The test program should run forever without leaking or showing any faults. 
> Note that gcc -fsantize does not detect races or memory errors, which suggests the bug
requires a delay at the right time to manifest. Valgrind's overhead and rr's code serialization
appears to provide that delay. It seems likely that dispatch's reconnect logic is providing
the delay in DISPATCH-902.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message