flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yun Gao (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-15010) Temp directories flink-netty-shuffle-* are not cleaned up
Date Wed, 01 Jan 2020 16:35:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006448#comment-17006448

Yun Gao commented on FLINK-15010:

The reason for this issue should be in standalone mode TaskManagers are shutdown by SIG_TERM
signal, and the cleanup of directories rely on shutdown hooks, however, there are no shutdown
hook registered for netty shuffle environment. 

An intuitive thought is to add shutdown hook directly for _NettyShuffleEnvironment_, however,
it cannot ensure the directories get cleaned up in all cases, since the directories are created
in the constructor of _FileChannelManagerImpl_, which comes before registering  shutdown
hook in _NettyShuffleEnvironment's_ constructor_._ If __ task __ managers receive SIG_TERM
between the two actions, the directories will not be cleaned. Therefore, the current PR enhance _FileChannelManagerImpl_ by
allowing the callers to specify whether to register a shutdown hook for the manager, and the
hook is registered before creating the directories. 

Besides, The above issue also exist for the existing _FileChannelManagerImpl_ usage in _IOManager_.
If the current fix is acceptable, we might also fix the _IOManager_ case in similar way.

> Temp directories flink-netty-shuffle-* are not cleaned up
> ---------------------------------------------------------
>                 Key: FLINK-15010
>                 URL: https://issues.apache.org/jira/browse/FLINK-15010
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>    Affects Versions: 1.9.1
>            Reporter: Nico Kruber
>            Assignee: Yun Gao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
> Starting a Flink cluster with 2 TMs and stopping it again will leave 2 temporary directories
(and not delete them): flink-netty-shuffle-<uid>

This message was sent by Atlassian Jira

View raw message