giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: Introducing Graft: A debugging and testing tool for Giraph algorithms
Date Wed, 11 Jun 2014 13:14:37 GMT
In general, I think this is a cool idea which could be prototyped quickly,
e.g. by leveraging the existing features like Giraph snapshots. This allows
us to write to HDFS the state of the computation (e.g. the vertex values,
but also messages) during the computation which we can read with the input
graph (if we assume a static graph). My concern is that if you use Giraph
you probably have a graph that is really large, even for Gephi. So I feel
one of the question around such a tool would be how to sample the data
effectively.

Anyway, I welcome Graft. Tools should really be at the top of our priority
at this point with Giraph. On these topic, do you guys at Facebook (read
@Avery) plan to release your visualizer anytime soon? I've seen the slides
for your presentation at IWGDM, and there's a slide about a visualizer
GiraphicJam there (slide number 27).


On Wed, Jun 11, 2014 at 2:26 PM, Mirko Kämpf <mirko.kaempf@cloudera.com>
wrote:

> Hi,
>
> some time ago I was starting work on visualization of graph data, stored in
> Hadoop via Gephi. A first draft of results is here in this blog post:
>
> http://blog.cloudera.com/blog/2014/05/how-to-manage-time-dependent-multilayer-networks-in-apache-hadoop/
> We found, to handle the metadata for graphs and the appropriate
> input-converters was the major problem which had to be solved. Now it is
> easy to retrieve edge and node lists, even for time dependent graphs. The
> current solution works with Hive or Impala to retrieve the data via JDBC.
>
> But I think, it would be great to have an API in Giraph which allows to
> trigger a snapshot of the current state of a graph which is processed.
> After such a snapshot is done the external tool loads this data, e.g. into
> Gephi. Maybe in a second step, we can just load the data from all worker
> nodes directly instead of HDFS, but for the beginning it would be fine to
> use HDFS to decouple the processing layer and the gui.
>
> In case of really large graphs, I think a Java-Applet using the
> "gephi-tools" project could do a great job to render a large graph.
>
> The snapshot could be triggered via Zookeeper. A job registers its ability
> to receive such an optional request. And via Zookeeper a client can find
> all graphs to look into (based on such a snapshot) and than sends this
> request. In the next superstep the job looks for the snapshot status in
> Zookeeper, creates one or just precedes and so on. This would even allow to
> export time dependent intermediate results from running graph algorithms
> without a new start.
>
> What do you think about such a feature? I think it is also related to the
> "graph centric API", propsed a while ago.
> Is it worth a JIRA and do you see use cases for this feature?
>
> Best wishes,
> Mirko
>



-- 
   Claudio Martella

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message