giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <apache.mail...@gmail.com>
Subject Re: [jira] [Commented] (GIRAPH-328) Outgoing messages from current superstep should be grouped at the sender by owning worker, not by partition
Date Fri, 14 Sep 2012 18:06:13 GMT
Whew! Thats good to know. Digging around in there I had the feeling I might
be going down a blind alley. If you notice anything that is not good for a
final solution, drop me a line and I can start fixing/improving on this
until it starts looking like a real solution.


On Thu, Sep 13, 2012 at 4:26 PM, Avery Ching (JIRA) <jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/GIRAPH-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455431#comment-13455431]
>
> Avery Ching commented on GIRAPH-328:
> ------------------------------------
>
> This is the right behavior that you suggest.  I was going to do it, but
> glad that you're taking a cut!
>
> > Outgoing messages from current superstep should be grouped at the sender
> by owning worker, not by partition
> >
> -----------------------------------------------------------------------------------------------------------
> >
> >                 Key: GIRAPH-328
> >                 URL: https://issues.apache.org/jira/browse/GIRAPH-328
> >             Project: Giraph
> >          Issue Type: Improvement
> >          Components: bsp, graph
> >    Affects Versions: 0.2.0
> >            Reporter: Eli Reisman
> >            Assignee: Eli Reisman
> >            Priority: Minor
> >             Fix For: 0.2.0
> >
> >         Attachments: GIRAPH-328-1.patch
> >
> >
> > Currently, outgoing messages created by the Vertex#compute() cycle on
> each worker are stored and grouped by the partitionId on the destination
> worker to which the messages belong. This results in messages being
> duplicated on the wire per partition on a given receiving worker that has
> delivery vertices for those messages.
> > By partitioning the outgoing, current-superstep messages by destination
> worker, we can split them into partitions at insertion into a MessageStore
> on the destination worker. What we trade in come compute time while
> inserting at the receiver side, we gain in fine grained control over the
> real number of messages each worker caches outbound for any given worker
> before flushing, and how those flush messages are aggregated for delivery
> as well. Potentially, it allows for a great reduction in duplicate messages
> sent in situations like Vertex#sendMessageToAllEdges() -- see GIRAPH-322,
> GIRAPH-314. You get the idea.
> > This might be a poor idea, and it can certainly use some additional
> refinement, but it passes mvn verify and may even run ;) It interoperates
> with the disk spill code, but not as well as it could. Consider this a
> request for comment on the idea (and the approach) rather than a finished
> product.
> > Comments/ideas/help welcome! Thanks
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message