thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Reiss <dre...@facebook.com>
Subject Re: erlang server/client closing connections
Date Fri, 13 Aug 2010 18:49:13 GMT
I did a major refactor of the Erlang library that I think might resolve this
issue.  https://issues.apache.org/jira/browse/THRIFT-599  With my patch,
thrift_buffered_transport is no longer a separate process, so there is no
need for a gen_server call.  This patch hasn't been committed yet because
at the time I posted it, Facebook hadn't deployed it in production anywhere.
We have now, though, so if people want, I can check it in.

--David

On 08/13/2010 11:08 AM, Anthony Molinaro wrote:
> Okay, another update, the problem is the recv_timeout, and it's almost possible
> to get it to work the way I want it too, but it required a hack to work.
> 
> I switched to thrift_socket_server, and for those who were using thrift_server.
> Instead of creating like
> 
> thrift_server:start_link(Port, ServiceModule, HandlerModule).
> 
> you do the following
> 
> thrift_socket_server:start ([{port, Port)},
>                              {service, ServiceModule},
>                              {handler, HandlerModule}]).
> 
> By default the recv_timeout is set to 500ms, so the connections shut down
> almost immediately, you can add a higher recv_timeout like.
> 
> thrift_socket_server:start ([{port, Port)},
>                              {service, ServiceModule},
>                              {handler, HandlerModule},
>                              {socket_opts, [{recv_timeout, 60*60*1000}]}]).
> 
> However, then you get a timeout on gen_server:call/3 which crashes the
> processs.  I tracked down the timeout to this call in
> thrift_buffered_transport.erl
> 
> read(Transport, Len) when is_integer(Len) ->
>     gen_server:call(Transport, {read, Len}, _Timeout=10000).
> 
> So to check I just changed 10000 to 60*60*1000 and connections seem to
> stay around now, at least for an hour of inactivity, which is fine for
> my testing.
> 
> I think the appropriate fix would to somehow expose that timeout value
> as an option to the server.  Maybe something like idle_timeout or
> read_timeout, then the trick is getting it tunneled to that call, currently
> thrift_buffered_transport doesn't accept any options, it could be added
> as a third parameter to read, but it would have to happen in all the
> transports, which looking through the code doesn't seem that bad, the
> only minor issue would be with thrift_socket_transport which uses recv_timeout
> right now as a read_timeout, so you have 2 timeouts to choose from.
> Also, you still need to get that timeout to the places read is called.
> 
> Well not sure if its worth it or not?  If I have a chance I can hack at it,
> but for the moment I have to finish some things off, so will have to get
> back to this later.
> 
> -Anthony
> 
> On Fri, Aug 13, 2010 at 12:12:49AM -0700, Anthony Molinaro wrote:
>>
>> On Thu, Aug 12, 2010 at 10:41:50PM -0700, David Reiss wrote:
>>> usually, this sort of thing happens because the server has a recv timeout
>>> set.  I see that thrift_socket_server sets a recv timeout, but I can't tell
>>> if thrift_server is doing so.  One possibility might be to put some debugging
>>> code in thrift_processor to determine if it is terminating and closing the
>>> connection.
>>
>> So looking again, it looks like I was mistaken about keepalive being true.
>> It's inherited from the listen process, but there doesn't seem to be a way
>> to pass options in (this is for the thrift_server).  I hardcoded it and
>> passed the option to the client, but it doesn't seem to help.
>> So a receive timeout might be a problem as I create connections at startup
>> but in my dev env don't really use them for a while.  So if the server decides
>> the client isn't going to send anything it might close down it's connection.
>> I tried to dig down and see this happen but I don't see the processor break
>> out of it's loop, I dropped some io:formats, but it doesn't seem to trigger
>> any branch of the case, so I'm not certain what is happening.  I think I'll
>> have to see if I can trace it and see what I find.
>>
>>> I'm not sure if thrift_server is supposed to be deprecated in favor of
>>> thrift_socket_server.  Chris Piro or Todd Lipcon might know.
>>
>> I got this usage of thrift_server from Todd's thrift_erl_skel, but
>> maybe it's out of date.  I'll take a look at thrift_socket_server
>> tomorrow to see what it looks like.  A quick glance and it looks very
>> different from thrift_server.
>>
>> I may just try to rewrite my pooling mechanism so that instead of
>> starting processes when my server starts, start them the first time
>> a request is made.  The only problem is since the only way for the
>> client to know the server has hung up on him is to make a call, I'll
>> have to retry if I create a process, stick it into the pool to reuse,
>> pull it out a few seconds later, get an exception then have to re-connect
>> and rerun the call :(
>>
>> -Anthony
>>  
>>> On 08/12/2010 10:22 PM, Anthony Molinaro wrote:
>>>> Hi,
>>>>
>>>>   I'm trying to use pg2 to cache several thrift client connections so I
>>>> can spread load across them.  This seems to work great however, the
>>>> connections seem to go stale, I think the server is dropping them, however
>>>> looking through the thrift code is seems like keepalive is true, so I'm
>>>> not sure why this would be the case.
>>>>
>>>> I start my server with
>>>>
>>>> thrift_server:start_link/3
>>>>
>>>> and the client processes are started with
>>>>
>>>> thrift_client:start_link/3
>>>>
>>>> The process stays alive fine on the client, but goes away after about
>>>> 30 seconds or so on the server (probably less they seem to go away
>>>> quick).  Since the client is alive, when I do a call I get this
>>>> exception.
>>>>
>>>> {{case_clause,{error,closed}},
>>>>  [{thrift_client,read_result,3},
>>>>   {thrift_client,catch_function_exceptions,2},
>>>>   {thrift_client,handle_call,3},
>>>>   {gen_server,handle_msg,5},
>>>>   {proc_lib,init_p_do_apply,3}]}
>>>>
>>>> Is there anyway to keep this from happening?
>>>>
>>>> Thanks,
>>>>
>>>> -Anthony
>>>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Anthony Molinaro                           <anthonym@alumni.caltech.edu>
> 

Mime
View raw message