flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BoWang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-12229) Implement Lazy Scheduling Strategy
Date Wed, 01 May 2019 06:05:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830880#comment-16830880

BoWang commented on FLINK-12229:

Thanks [~till.rohrmann].

The advantage of only looking at the input result partitions is obvious. But I am wondering
there may be some negative effects. 1) Once any result partition finishes the consumer vertex
will be scheduled and the reset result partition infos would be send to the TM separately.
That would be a lot of additional network communications for updating partition info if the input
partition number is huge since looking at the IntermediateDataSet all the partition infos
are composed in the `TaskDeploymentDescriptor`. 2) There may be resource deadlock, e.g.,
considering a job with map-reduce-join job vertices. Both reduce and join job vertices has
ANY input constraints so parts of the tasks could be scheduled but cannot finishes until all
the input result partitions are ready. When the free resource of the cluster are not enough,
the running join vertices are waiting for all the input while part of reduce vertices are
waiting resource to be schedule.

Making SchedulingIntermediateDataSet as part of LazyFromSourcesSchedulingStrategy would work,
I will do like this.


> Implement Lazy Scheduling Strategy
> ----------------------------------
>                 Key: FLINK-12229
>                 URL: https://issues.apache.org/jira/browse/FLINK-12229
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Gary Yao
>            Assignee: BoWang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
> Implement a {{SchedulingStrategy}} that covers the functionality of {{ScheduleMode.LAZY_FROM_SOURCES}},
i.e., vertices are scheduled when all the input data are available.
> Acceptance Criteria:
>  * New strategy is tested in isolation using test implementations (i.e., without having
to submit a job)

This message was sent by Atlassian JIRA

View raw message