giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Shaposhnik (JIRA)" <>
Subject [jira] [Updated] (GIRAPH-800) Resolving mutations on a large graph causes timeouts
Date Fri, 06 Jun 2014 22:10:03 GMT


Roman Shaposhnik updated GIRAPH-800:

    Fix Version/s:     (was: 1.1.0)

> Resolving mutations on a large graph causes timeouts
> ----------------------------------------------------
>                 Key: GIRAPH-800
>                 URL:
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>    Affects Versions: 1.1.0
>         Environment: hadoop1
>            Reporter: Craig Muchinsky
>         Attachments: GIRAPH-800.patch
> When processing a graph with a large number of mutations and/or a large number of messages
per superstep, the pre-superstep logic can appear to be hung up and eventually the graph times
out either because of mapreduce task inactivity or hitting the max superstep wait.
> While its possible to tune around this by adding a strategic call to context.progress()
in NettyServerWorker.resolveMutations() and bumping up the giraph.maxMasterSuperstepWaitMsecs
setting, it would seem this part of the code might need some optimization.
> As an example, in a graph with 2B vertices and 2.5B edges the transition between supersteps
with 1B messages in flight can take 15-30 minutes on a cluster with 228 workers (2 threads,
8GB RAM per worker).
> While the vertex resolve processing can be time consuming, I believe its the check for
missing vertices (second loop within NettyServerWorker.resolveMutations()) that is the real
performance bottleneck. I haven't identified a fix to this logic as of yet, but I did identify
a possible workaround. I believe when dealing with a static and complete graph the resolveMutations()
call can be skipped all together. A quick test of this theory yielded a 3x performance improvement
in my sandbox.

This message was sent by Atlassian JIRA

View raw message