[ https://issues.apache.org/jira/browse/FLINK2909?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=15232197#comment15232197
]
ASF GitHub Bot commented on FLINK2909:

Github user vasia commented on a diff in the pull request:
https://github.com/apache/flink/pull/1807#discussion_r59026824
 Diff: docs/apis/batch/libs/gelly.md 
@@ 1734,3 +1734,547 @@ vertex represents a group of vertices and each edge represents
a group of edges
vertex and edge in the output graph stores the common group value and the number of represented
elements.
{% top %}
+
+Graph Generators
+
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scalefree, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
 End diff 
Could we add an overview of how graph generators can be created and used? e.g. that you
pass the parameters to the specific generator and then call `generate()` to get the graph?
> Gelly Graph Generators
> 
>
> Key: FLINK2909
> URL: https://issues.apache.org/jira/browse/FLINK2909
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Affects Versions: 1.0.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be useful for
performing scalability, stress, and regression testing as well as benchmarking and comparing
algorithms, for both Flink users and developers. Generated data is infinitely scalable yet
described by a few simple parameters and can often substitute for user data or sharing large
files when reporting issues.
> There are at multiple categories of graphs as documented by [NetworkXhttps://networkx.github.io/documentation/latest/reference/generators.html]
and elsewhere.
> Graphs may be a welldefined, i.e. the [Chvátal graphhttps://en.wikipedia.org/wiki/Chv%C3%A1tal_graph].
These may be sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use Flink's distributed
parallelism.
> Graphs may be stochastic, i.e. [RMat graphshttp://snap.stanford.edu/class/cs224wreadings/chakrabarti04rmat.pdf]
. A key consideration is that the graphs should source randomness from a seedable PRNG and
generate the same Graph regardless of parallelism.

This message was sent by Atlassian JIRA
(v6.3.4#6332)
