flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijiang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4364) Implement TaskManager side of heartbeat from JobManager
Date Wed, 09 Nov 2016 10:30:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650576#comment-15650576
] 

Zhijiang Wang commented on FLINK-4364:
--------------------------------------

Hi [~till.rohrmann], for the heartbeat interaction between TM and JM, the process is almost
the same with RM as we discussed before.
There will be another separate {{HeartbeatManagerImpl}} and {{HeartbeatListener}} in TM used
for JM heartbeat.
Also TM will monitor the {{HeartbeatTarget}} when registration at new JM successfully by HA
mechanism.

There are two issues to be confirmed:
1. If TM detects JM as dead by heartbeat timeout, TM should not release all the tasks and
slots which belong to that JM. TM should do nothing when notified of heartbeat timeout. It
will re-register the new JM by HA and offer the related slots if possible. It is related with
JM failure recovery process. If JM detects TM as dead by heartbeat timeout, it will release
all the related slots with that TM and request from RM again.
2. For payload informations, currently I am not sure which informations need to be reported
by heartbeat. The JM may need {{SlotPool}} to be consistent with {{SlotOffer}}, and it also
concerns about other processes. So I think we can deliver payload as null in current implementation
and just make the monitor function effect. Later we can expand the payload information as
needed.

Do you thinks the above points are feasible? Then I will work on it this week.

> Implement TaskManager side of heartbeat from JobManager
> -------------------------------------------------------
>
>                 Key: FLINK-4364
>                 URL: https://issues.apache.org/jira/browse/FLINK-4364
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Cluster Management
>            Reporter: Zhijiang Wang
>            Assignee: Zhijiang Wang
>
> The {{JobManager}} initiates heartbeat messages via (JobID, JmLeaderID), and the {{TaskManager}}
will report metrics info for each heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message