flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhu Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-12138) Limit input split count of each source task for better failover experience
Date Tue, 09 Apr 2019 08:00:00 GMT
Zhu Zhu created FLINK-12138:
-------------------------------

             Summary: Limit input split count of each source task for better failover experience
                 Key: FLINK-12138
                 URL: https://issues.apache.org/jira/browse/FLINK-12138
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Coordination
    Affects Versions: 1.9.0
            Reporter: Zhu Zhu
            Assignee: Zhu Zhu


Flink currently use an InputSplitAssigner to dynamically assign input splits to source tasks.
A task requests a new split after finishes processing the previous one. Thus to achieve
a better load balance.

However, in cases that the slots are fewer than the source tasks, only the first launched
source tasks can request splits and it will last till all the splits are consumed. This is
not failover friendly, as users sometimes intentionally set a larger parallelism to reduce the
failover impact.

For example, a job runs in an 10 slots session and it has an 1000 parallelism source vertex to
consume 10000 splits, all vertices are not connected to others. Currently, 10 of 1000 source
task will be launched and will only finish after all the input splits are consumed. If a
task fails, at most ~1000 splits need to be re-processed. While if 1000 tasks can run at once,
only ~10 splits needs to be re-processed.

 

We's propose add a cap for the input splits count that each source task shall process. Once
the cap is reached, the task cannot get any more split from the InputSplitAssigner and finishes
then. Thus slot space can be made for other source tasks.

Theoretically, it would be proper to set the cap to be max(Input split size)/avg(input split
size).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message