thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roger Meier" <ro...@bufferoverflow.ch>
Subject RE: non-blocking servers are leaking sockets
Date Wed, 22 Jan 2014 22:01:34 GMT
You need to catch the IOException that was thrown by TNonblockingSocket
during read within your application, see here:
https://git-wip-us.apache.org/repos/asf/thrift/repo?p=thrift.git;a=blob;f=li
b/java/src/org/apache/thrift/transport/TNonblockingSocket.java;h=482bd149ab0
a993e90315e4f719d0903c89ac1f0;hb=HEAD#l140

Thrift library does not know what to do on network issues or similar issues
that can cause a read to fail within your environment.

;-r

-----Original Message-----
From: Jules Cisek [mailto:jules@luminate.com] 
Sent: Mittwoch, 22. Januar 2014 21:10
To: user@thrift.apache.org
Subject: Re: non-blocking servers are leaking sockets

this service actually needs to respond in under 100ms (and usually does in
less than 20) so a short delay is just not possible.

on the server, i see a lot of this in the logs:

14/01/22 19:15:27 WARN Thread-3 server.TThreadedSelectorServer: Got an
IOException in internalRead!
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
        at sun.nio.ch.IOUtil.read(IOUtil.java:224)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
        at
org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:
141)
        at
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(
AbstractNonblockingServer.java:515)
        at
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(Abstract
NonblockingServer.java:305)
        at
org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.hand
leRead(AbstractNonblockingServer.java:202)
        at
org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.select(TThre
adedSelectorServer.java:576)
        at
org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreade
dSelectorServer.java:536)

(note that these resets happen when the async client doesn't get a response
from the server in the time set using client.setTimeout(m) which in our case
can be quite often and we're ok with that)

i'm not sure why the thrift library feels it's necessary to log this stuff
since clients drop connections all the time and should be expected to and
frankly it makes me think that somehow this common error is not being
properly handled (although looking through the code it does look like
eventually the SocketChannel gets close()'ed)

~j



On Mon, Jan 20, 2014 at 12:15 PM, Sammons, Mark
<mssammon@illinois.edu>wrote:

> Hi, Jules.
>
> I'm not sure my problems are completely analogous to yours, but I had 
> a situation where a client program making many short calls to a remote 
> thrift server was getting a "no route to host" exception after some 
> number of calls, and it appeared to be due to slow release of closed 
> sockets.  I found that adding a short (20ms) delay between calls 
> resolved the problem.
>
> I realize this is not exactly a solution, but it has at least allowed 
> me to keep working...
>
> Regards,
>
> Mark
>
> ________________________________________
> From: Jules Cisek [jules@luminate.com]
> Sent: Monday, January 20, 2014 12:39 PM
> To: user@thrift.apache.org
> Subject: non-blocking servers are leaking sockets
>
> i'm running java TThreadedSelectorServer and THsHaServer based servers 
> and both seem to be leaking sockets (thrift 0.9.0)
>
> googling around searching for answers i keep running into
> https://issues.apache.org/jira/browse/THRIFT-1653 which puts the blame 
> on the TCP config on the server while acknowledging that perhaps a 
> problem in the application layer does exist (see last entry)
>
> i prefer not to mess with the TCP config on the machine because it is 
> used for various tasks, also i did not have these issues with a 
> TThreadPoolServer and a TSocket (blocking + TBufferedTransport) or any 
> non-thrift server on the same machine.
>
> what happens is i get a bunch of TCP connections in a CLOSE_WAIT state 
> and these remain in that state indefinitely.  but what is even more 
> concerning, i get many sockets that don't show up in netstat at all 
> and only lsof can show me that they exist.  on Linux lsof shows them 
> as "can't identify protocol".  according to 
> https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/ 
> these sockets are in a "half closed state" and the linux kernel has no 
> idea what to do with them.
>
> i'm pretty sure there's a problem with misbehaving clients, but the 
> server should not fall leak resources because of a client side bug.
>
> my only recourse is to run a cronjob that looks at the lsof output and 
> restarts the server whenever the socket count gets dangerously close 
> to "too many open files" (8192 in my case)
>
> any ideas?
>
> --
> jules cisek | jules@luminate.com
>



--
jules cisek | jules@luminate.com


Mime
View raw message