giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (Commented) (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution
Date Mon, 17 Oct 2011 04:58:12 GMT


Dmitriy V. Ryaboy commented on GIRAPH-37:

Ok emailed Marius Eriksen (Finagle lead, among other things), and here's his feedback so far:

that's great! (that they're doing this). would be happy to help in any way to make it work.

> 1) why a custom thrift compiler? makes distribution of code hard, have
> to make devs install that

this sucks, but it sadly necessary (unless we can get our work integrated with the standard
thrift stack). we do require custom codegen in order to interface with the finagle thrift

we now actually have our own entirely-in-JVM codegenerator, that parses thrift IDL, etc.--
so at the very least we'll have something portable that also shouldn't require any installation--
presumably the various build systems can download them as a build-only dependency, etc. we're
using this internally for a few projects already, but still working out how to widely distribute

> 2) gigantic hard to understand stack traces

that's mostly a fact of life, sadly. i mean, with any asynchronous system you have much less
context in your stack traces generally, but with proliferation of anonymous closures in the
finagle codebase, it's often made even worse.

a few things here: (1) as of 1.9.3 (i notice this patch uses 1.9.0) stacks are now unwound
per responder per thread. this means roughly the stacks you observe will ever only be one
callback deep. now this might be even worse in terms of debugging, but it does produce cleaner/smaller
stack traces.

debuggability is a big concern (both for finagle, and for general use of Futures). one interesting
difference between asynchronous systems and synchronous ones is that stack traces don't tell
the story, or may tell only part of the story. really what you want is a dispatch *graph*.
we have a mechanism in twitter futures (called Locals-- they're like thread locals but instead
they're local to the dispatch graph) where can record dispatches. this would now give us our
graph. a little weird, maybe, but certainly something that would be very helpful in many circumstances.
i'm still toying around with how to expose them (eg. we could synthesize stacks that's really
a topological sort of the dispatch graph in all exceptions encoded by finagleā€¦)

> 3) some stability issues, apparently

i looked at his patch briefly.  this part is suspect (the fact that he throws in a callback).

+    @Override
+    public void onFailure(Throwable cause) {
+      cdl.countDown();
+      throw new RuntimeException("Hit exception in proxied call", cause);
+    }
and would cause that exception to be thrown. it's actually harmless in terms of functionality,
but it will report the wrong underlying reason.

none of the user provided handlers should throw exceptions. at the same time, the fact that
it's reported as "result set multiple times" may indicate a bug somewhere. i'm going to look
into that probably by ~wed or so (my schedule is pretty filled up until then).

it's difficult to debug what's going on there (2/3s successful runs) without getting some
stats out of the system, and/or diving deeper into the code. it sounds like perhaps the client
isn't tuned properly for the particular use case.

anyhow. in my experience, almost *all* debugging of these sorts of systems can be done by
looking at the client/server stats. and finagle exports a rich set of stats for both.

use the .reportTo() method in the builder to report to either ostrich or science/commons stats,
or provide your own StatsReceiver.

> Implement Netty-backed rpc solution
> -----------------------------------
>                 Key: GIRAPH-37
>                 URL:
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't
went in another direction. I think there is still value in this approach, and will also look
at Finagle.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message