flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Hogan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-7019) Rework parallelism in Gelly algorithms and examples
Date Fri, 07 Jul 2017 00:41:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Greg Hogan updated FLINK-7019:
------------------------------
    Description: 
Flink job parallelism is set with {{ExecutionConfig#setParallelism}} or when {{-p}} on the
command-line. The Gelly algorithms {{JaccardIndex}}, {{AdamicAdar}}, {{TriangleListing}},
and {{ClusteringCoefficient}} have intermediate operators which generate output quadratic
in the size of input. These algorithms may need to be run with a high parallelism but doing
so for all operations is wasteful. Thus was introduced "little parallelism".

This can be simplified by moving the parallelism parameter to the new common base class with
the rule-of-thumb to use the algorithm parallelism for all normal (small output) operators.
The asymptotically large operators will default to the job parallelism, as will the default
algorithm parallelism.

  was:
Flink job parallelism is set with {{ExecutionConfig#setParallelism}} or when {{-p}} on the
command-line. The Gelly algorithms {{JaccardIndex}}, {{AdamicAdar}}, {{TriangleListing}},
and {{ClusteringCoefficient}} have intermediate operators which generated output quadratic
in the size of input. These algorithms may need to be run with a high parallelism but doing
so for all operations is wasteful. Thus was introduced "little parallelism".

This can be simplified by moving the parallelism parameter to the new common base class and
with the rule-of-thumb to use the algorithm parallelism for all normal (small output) operators.
The asymptotically large operators will default to the job parallelism, as will the default
algorithm parallelism.


> Rework parallelism in Gelly algorithms and examples
> ---------------------------------------------------
>
>                 Key: FLINK-7019
>                 URL: https://issues.apache.org/jira/browse/FLINK-7019
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Gelly
>    Affects Versions: 1.4.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
>             Fix For: 1.4.0
>
>
> Flink job parallelism is set with {{ExecutionConfig#setParallelism}} or when {{-p}} on
the command-line. The Gelly algorithms {{JaccardIndex}}, {{AdamicAdar}}, {{TriangleListing}},
and {{ClusteringCoefficient}} have intermediate operators which generate output quadratic
in the size of input. These algorithms may need to be run with a high parallelism but doing
so for all operations is wasteful. Thus was introduced "little parallelism".
> This can be simplified by moving the parallelism parameter to the new common base class
with the rule-of-thumb to use the algorithm parallelism for all normal (small output) operators.
The asymptotically large operators will default to the job parallelism, as will the default
algorithm parallelism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message