thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sammons, Mark" <mssam...@illinois.edu>
Subject RE: non-blocking servers are leaking sockets
Date Mon, 20 Jan 2014 20:15:09 GMT
Hi, Jules.

I'm not sure my problems are completely analogous to yours, but I had a 
situation where a client program making many short calls to a remote
thrift server was getting a "no route to host" exception after some number
of calls, and it appeared to be due to slow release of closed sockets.  I found
that adding a short (20ms) delay between calls resolved the problem.  

I realize this is not exactly a solution, but it has at least allowed me to 
keep working...

Regards,

Mark

________________________________________
From: Jules Cisek [jules@luminate.com]
Sent: Monday, January 20, 2014 12:39 PM
To: user@thrift.apache.org
Subject: non-blocking servers are leaking sockets

i'm running java TThreadedSelectorServer and THsHaServer based servers and
both seem to be leaking sockets (thrift 0.9.0)

googling around searching for answers i keep running into
https://issues.apache.org/jira/browse/THRIFT-1653 which puts the blame on
the TCP config on the server while acknowledging that perhaps a problem in
the application layer does exist (see last entry)

i prefer not to mess with the TCP config on the machine because it is used
for various tasks, also i did not have these issues with a
TThreadPoolServer and a TSocket (blocking + TBufferedTransport) or any
non-thrift server on the same machine.

what happens is i get a bunch of TCP connections in a CLOSE_WAIT state and
these remain in that state indefinitely.  but what is even more concerning,
i get many sockets that don't show up in netstat at all and only lsof can
show me that they exist.  on Linux lsof shows them as "can't identify
protocol".  according to
https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/ these
sockets are in a "half closed state" and the linux kernel has no idea what
to do with them.

i'm pretty sure there's a problem with misbehaving clients, but the server
should not fall leak resources because of a client side bug.

my only recourse is to run a cronjob that looks at the lsof output and
restarts the server whenever the socket count gets dangerously close to
"too many open files" (8192 in my case)

any ideas?

--
jules cisek | jules@luminate.com

Mime
View raw message