flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yang Wang (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-15449) Retain lost task managers on Flink UI
Date Thu, 02 Jan 2020 03:18:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006553#comment-17006553

Yang Wang commented on FLINK-15449:

I think it is an valid user experience improvement. However, if we retain all the TaskManagers,
it will cost more memory in jobmanager. When the taskmanager failover frequently, the jobmanager
will OOM. If we add a threshold for removing lost taskmanagers, it will not make much differences
with now.


I want to share how to debug the lost taskmanager now. First, you need to find which nodemanager
the lost taskmanager is located at. Then use the schema \{{http://{RM_Address:PORT}/node/containerlogs/\{container_id}/\{user}}}
to construct the log url. The log url could be used until the application is finished.

> Retain lost task managers on Flink UI
> -------------------------------------
>                 Key: FLINK-15449
>                 URL: https://issues.apache.org/jira/browse/FLINK-15449
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.9.1
>            Reporter: Victor Wong
>            Priority: Major
> With Flink on Yarn, sometimes our TaskManager was killed because of OOM or heartbeat
timeout or whatever reasons, it's not convenient to check out the logs of the lost TaskManger.
> Can we retain the lost task managers on Flink UI, and provide the log service through
Yarn (we can redirect the URL of log/stdout to Yarn container log/stdout)?

This message was sent by Atlassian Jira

View raw message