flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "TisonKun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10640) Enable Slot Resource Profile for Resource Management
Date Wed, 05 Dec 2018 13:07:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710041#comment-16710041
] 

TisonKun commented on FLINK-10640:
----------------------------------

@[~wuzang]

After an offline discuss with [~till.rohrmann], for part of "TM Management" issue, i.e.,
start arbitrary TMs on yarn session launched, I propose introduce a pair (min, max) represents
the minimum and maximum for the number of running {{TaskExecutor}}s.

With such option, when setting {{minimum = maximum = n}} we effectively have the same behaviour
as before with the pre-Flip-6 code, that is, a fixed number of pre-allocated TMs; and when
setting {{minimum = 0, maximum = inf}} we effectively have the same behaviour as current code
path. I think such a feature improve "TM Management" especially when user want to running
job on a specific cluster and require less changes than achieving an arbitrarily flexible
"TM Management".

What do you think?

> Enable Slot Resource Profile for Resource Management
> ----------------------------------------------------
>
>                 Key: FLINK-10640
>                 URL: https://issues.apache.org/jira/browse/FLINK-10640
>             Project: Flink
>          Issue Type: New Feature
>          Components: ResourceManager
>            Reporter: Tony Xintong Song
>            Priority: Major
>
> Motivation & Backgrounds
>  * The existing concept of task slots roughly represents how many pipeline of tasks a
TaskManager can hold. However, it does not consider the differences in resource needs and
usage of individual tasks. Enabling resource profiles of slots may allow Flink to better allocate
execution resources according to tasks fine-grained resource needs.
>  * The community version Flink already contains APIs and some implementation for slot
resource profile. However, such logic is not truly used. (ResourceProfile of slot requests
is by default set to UNKNOWN with negative values, thus matches any given slot.)
> Preliminary Design
>  * Slot Management
>  A slot represents a certain amount of resources for a single pipeline of tasks to run
in on a TaskManager. Initially, a TaskManager does not have any slots but a total amount of
resources. When allocating, the ResourceManager finds proper TMs to generate new slots for
the tasks to run according to the slot requests. Once generated, the slot's size (resource
profile) does not change until it's freed. ResourceManager can apply different, portable strategies
to allocate slots from TaskManagers.
>  * TM Management
>  The size and number of TaskManagers and when to start them can also be flexible. TMs
can be started and released dynamically, and may have different sizes. We may have many different,
portable strategies. E.g., an elastic session that can run multiple jobs like the session
mode while dynamically adjusting the size of session (number of TMs) according to the realtime
working load.
>  * About Slot Sharing
>  Slot sharing is a good heuristic to easily calculate how many slots needed to get the
job running and get better utilization when there is no resource profile in slots. However,
with resource profiles enabling finer-grained resource management, each individual task has
its specific resource need and it does not make much sense to have multiple tasks sharing
the resource of the same slot. Instead, we may introduce locality preferences/constraints
to support the semantics of putting tasks in same/different TMs in a more general way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message