giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-83) Is Vertex correct yet?
Date Tue, 15 Nov 2011 00:04:51 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150094#comment-13150094
] 

Jakob Homan commented on GIRAPH-83:
-----------------------------------

Looking at the original Pregel paper, the Vertex instance has eight methods (compute, vertex_id,
superstep, GetValue, MutableValue, GetOutEdgeIterator, SendMessageTo and VoteToHalt). Currently,
BasicVertex has 24.  There are also three different types of Vertices (Vertex, MutableVertex
and BasicVertex) linked via inheritance and exposed to the users.  I'm wondering if this interface
is quite right yet.

There are two main concerns: one, this is the contract users are starting to write applications
against and which we'll need to support for a long time, with as few tweaks as possible. 
It'd be good to be relatively sure of its limits before we make an initial release.  Second,
the use of inheritance to join the user's implementation with the computation's state makes
it difficult to test.  How does one mock out the state that's fed into compute and verify
compute's result without starting up a cluster (either real or local; see GIRAPH-51).

Would it be reasonable to strip out as many methods as possible from Vertex, particularly
those dealing with state external to the Vertex itself: 
* getSuperStep
* getNumVertices
* getNumEdges
* getMsgList/iterator
* getEdgeValue
* hasEdge
* sendMsg
* sendMsgToAllEdges
* (g|s)etGraphState
* getContext
* getWorkerContext
* registerAggregator
* useAggregator

The outEdges data structures are a bit odd in that they are intrinsic to the vertex itself
(in the mathematical sense), but are managed by the framework.  It might be a bit clunky,
but structurally more correct to separate these out as well.
  
These methods and the state they manipulate could then be passed in as a Context (a new type
of Context, not one of the two others we have running around!) to the compute method.  This
moves compute() closer to a functional, testing model of computing across its input state
(which can be mocked out for testing and mangled as we evolve its innards).  The Vertex itself
could still of course maintain any state it would need, but like a Mapper, shouldn't need
much and would be discouraged from holding onto larges amounts of data between computations.

Thoughts?
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking we may not
have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message