tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stephen mallette (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TINKERPOP-1250) Support Subgraph-Centric GraphComputer
Date Wed, 06 Apr 2016 17:38:25 GMT

     [ https://issues.apache.org/jira/browse/TINKERPOP-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stephen mallette updated TINKERPOP-1250:
    Issue Type: Improvement  (was: New Feature)

> Support Subgraph-Centric GraphComputer
> --------------------------------------
>                 Key: TINKERPOP-1250
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1250
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>              Labels: breaking
> Right now, {{GraphComputer}} and {{VertexPrograms}} are "vertex-centric." That is, the
boundary of an atomic unit of computation is a single vertex, its properties, and its incident
edges. We should support "subgraph-centric" computing where each worker's partition is loaded
in memory (RAM) as a connected subgraph. For those vertices that are not in its partition,
a special "reference vertex" is use to reference it. What this would mean is that when a {{Traverser}}
is processing, it can continue to evaluate as long as its within the subgraph partition. The
moment it references a vertex not in the partition (a "reference vertex"), it serializes itself
as a message.
> This would greatly increase the speed of Gremlin OLAP at the cost of requiring a large
amount of memory to store all the worker partition subgraphs in RAM. There might even be a
way to have a hybrid-model where some of the partition is held in RAM and the other is (even
though still in the same partition) is stored as "star vertices."
> How would this be added to {{GraphComputer}} in a backwards compatible way?
> {code}
> GraphComputer.supportsVertexCentricComputing)() // currently true for all implementations
> GraphComputer.supportsSubgraphCentricComputing()
> {code}
> A {{VertexProgram}} could then have the following method:
> {code}
> boolean VertexProgram.withinCentricity(final M message)
> {code}
> If the message is NOT within "centricity", then it is serialized and distributed, else
it continues to execute.
> I haven't thought through all the API and implementation considerations. Though it would
be good to make this backwards compatible and, moving forward, able to support "edge-centric
computing" and thus, have a very memory limited OLAP system.
> How to think of the different models:
> 1. Vertex-centric: medium speed, medium expressivity, medium memory costs.
> 2. Subgraph-centric: high speed, high expressivity, high memory costs.
> 3. Edge-centric: high speed, low expressivity, low memory costs.
> What is "expressivity"? Well, subgraph-centric computations can have "local traversals"
that move beyond the "star graph." Edge-centric would not support any `by()`-modulators as
the computation is bound to a single edge.  Thus, low expressivity. It really depends on how
you represent the edge-centric edge list. Do you have the vertices on each side of an edge
duplicated with all their properties? This wold still be low memory costs, but you could get
more expressivity.
> Anywho, in general it would be nice if the underlying execution engine can handle the
three common distributed graph computing paradigms. Subgraph-centric seems the easiest to
support at this point in time.

This message was sent by Atlassian JIRA

View raw message