giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (Updated) (JIRA)" <>
Subject [jira] [Updated] (GIRAPH-11) Improve the graph distribution of Giraph
Date Mon, 14 Nov 2011 08:36:51 GMT


Avery Ching updated GIRAPH-11:

    Attachment:     (was: GIRAPH-11.3.diff)
> Improve the graph distribution of Giraph
> ----------------------------------------
>                 Key: GIRAPH-11
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.70.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-11.2.diff, GIRAPH-11.diff
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted.  If the
user data is not sorted by the vertex id, they must first run a MapReduce or Pig job to generate
a sorted dataset.  This is often a bit inconvenient.
> Giraph graph partitioning is currently range based and there are some advantages and
disadvantages of this approach.  The proposal of this JIRA would be to allow for both range
and hash based partitioning and provide more flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message