giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Femiano (Commented) (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
Date Mon, 02 Apr 2012 21:15:40 GMT


Brian Femiano commented on GIRAPH-153:

Avery and Jakob. Here's what I've got setup. I wanted to double-check this before moving
forward with the project template.

1) I have a subproject 'giraph-formats-contrib' under the giraph trunk that depends on giraph
0.2-SNAPSHOT. Since this is not yet hosted in maven central I installed it to my local repo.
Note this is only necessary if you wish to build the subproject. Not this is not a maven submodule
that builds as a dependency. It's entirely standalone. 

2) The subproject hosts the Accumulo 1.4.0 and HBase 0.92.1 abstract input/output formats,
and any future derived implementations. 

3) I copied the BspCase Junit class into the subproject redundantly. The subproject is builds
and tests entirly standalone from the main giraph build, except for the dependency giraph.jar.
Unfortuantely, the test classes are not included in the fat jar, so I copied one class into
the build for future unit testing. 

I'm moving forward with the unit tests. If you guys have think I should change anything I'll
happily rework my structure. The main thing I strived for was total separation from the main
build. It simply uses Giraph as a jar dependency. 
> HBase/Accumulo Input and Output formats
> ---------------------------------------
>                 Key: GIRAPH-153
>                 URL:
>             Project: Giraph
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.1.0
>         Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>            Reporter: Brian Femiano
> Four abstract classes that wrap their respective delegate input/output formats for
> easy hooks into vertex input format subclasses. I've included some sample programs that
show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed structure,
starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in the graph.

> Algorithm 1)  --> Accumulo as read/write source. Every vertex
starts thinking it's a root. At superstep 0, send a message down to each
> child as a non-root notification. After superstep 1, only root nodes will have never
been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by bundling
the notification logic followed by root node propagation. Once we've marked the appropriate
nodes as roots, tell every child which roots it can be traced back to via one or more spanning
trees. This will take N + 2 supersteps where N is the maximum number of hops from any root
to any leaf, plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus for recursive cache
file and archive searches. It is more hadoop centric than giraph, but these jobs use it so
I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the aforementioned
hardware, and full distributed on EC2. More details in the comments.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message