thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Marz <nathan.m...@gmail.com>
Subject Re: TNonblockingServer is dying with message "Totally Fucked"
Date Thu, 21 Jan 2010 01:27:30 GMT
OK... jumped into gdb and here's what I found:

(gdb) s
483      event_set(&event_, socket_, eventFlags_, TConnection::eventHandler,
this);
(gdb) p appState_
$8 = apache::thrift::server::APP_INIT
(gdb) s
484      event_base_set(server_->getEventBase(), &event_);
(gdb) p appState_
$9 = 128
(gdb) s
487      if (event_add(&event_, 0) == -1) {
(gdb) p appState_
$10 = 128
(gdb) s
490    }
(gdb) p appState_
$11 = 130

It appears to be getting corrupted twice, once during "event_base_set" and
once during "event_add". Any ideas?



On Wed, Jan 20, 2010 at 4:03 PM, David Reiss <dreiss@facebook.com> wrote:

> So you're saying that this happens on the first received message?
> Should be relatively easy to debug.
>
> 1/ Make a debug build of Thrift and Scribe.
> 2/ Put a breakpoint in the constructor of of TConnection.
> 3/ When the breakpoint hits, get the address of the appState_.
> 4/ Put a watchpoint on the contents of that address.  If possible,
>   make it conditional on the new value not being one of the valid
>   enum values.
> 5/ Continue.
> 6/ When the watchpoint triggers (and is not a valid enum), do a backtrace
>   to find out how it was corrupted.  Usually it is a memory error.
>
> If it is a memory error, it might be more efficient to just run it under
> valgrind.
>
> --David
>
> Nathan Marz wrote:
> > Could use some help on this one. I'm running into this error when using
> > scribe, and I traced back the error to TNonBlocking Server. Here's the
> tail
> > of the log:
> >
> > Thrift: Wed Jan 20 23:11:06 2010 libevent 1.3e method epoll
> > Thrift: Wed Jan 20 23:14:08 2010 Totally Fucked. Application State 130
> > scribed: src/server/TNonblockingServer.cpp:430: void
> > apache::thrift::server::TConnection::transition(): Assertion `0' failed.
> >
> > In the code, this message is printed whenever a switch statement doesn't
> > match any of the cases.
> >
> > I have scribe set up to have a "master" log server which aggregates all
> > logs, and the "client" servers simply forward messages to the master.
> > The clients work fine, it's the master that is crashing whenever it
> receives
> > a message. In case it's helpful, here's my scribe confs for
> master/client:
> >
> > master:
> >
> > port=1464
> >
> >
> > <store>
> > category=default
> > type=file
> > rotate_period=hourly
> > add_newlines=1
> > create_symlink=yes
> > file_path=/vol/scribe
> > base_filename=thisisoverwritten
> > fs_type=std
> > </store>
> >
> > client:
> >
> > port=1464
> >
> >
> > <store>
> > category=default
> > type=buffer
> >
> > target_write_size=20480
> > max_write_interval=1
> > buffer_send_rate=1
> > retry_interval=120
> > retry_interval_range=60
> >
> > <primary>
> > type=network
> > remote_host=XXX
> > remote_port=1464
> > </primary>
> >
> > <secondary>
> > type=file
> > fs_type=std
> > file_path=/mnt/scribe
> > base_filename=thisisoverwritten
> > max_size=300000000
> > </secondary>
> > </store>
> >
> >
> >
> >
>



-- 
Nathan Marz
Twitter: @nathanmarz
http://nathanmarz.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message