flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Congxian Qiu(klion26) (Jira)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-14340) Specify an unique DFSClient name for Hadoop FileSystem
Date Tue, 08 Oct 2019 04:19:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Congxian Qiu(klion26) updated FLINK-14340:
------------------------------------------
    Description: 
Currently, when Flink read/write to HDFS, we do not set the DFSClient name for all the connections,
so we can’t distinguish the connections, and can’t find the specific Job or TM quickly.

This issue wants to add the {{container_id}} as a unique name when init Hadoop File System,
so we can easily distinguish the connections belongs to which Job/TM.

 

Core changes is add a line such as below in {{org.apache.flink.runtime.fs.hdfs.HadoopFsFactory#create}}

 
{code:java}
hadoopConfig.set(“mapreduce.task.attempt.id”, System.getenv().getOrDefault(CONTAINER_KEY_IN_ENV,
DEFAULT_CONTAINER_ID));{code}
 

Currently, In {{YarnResourceManager}} and {{MesosResourceManager}} we both have an enviroment
key {{ENV_FLINK_CONTAINER_ID = "_FLINK_CONTAINER_ID"}}, so maybe we should introduce this
key in {{StandaloneResourceManager}}.

  was:
Currently, when Flink read/write to HDFS, we do not set the DFSClient name for all the connections,
so we can’t distinguish the connections, and can’t find the specific Job or TM quickly.

This issue wants to add the {{container_id}} as a unique name when init Hadoop File System,
so we can easily distinguish the connections belongs to which Job/TM.

 

Core changes is add a line such as below in {{org.apache.flink.runtime.fs.hdfs.HadoopFsFactory#create}}

 
{code:java}
hadoopConfig.set(“mapreduce.task.attempt.id”, System.getenv().getOrDefault(CONTAINER_KEY_IN_ENV,
DEFAULT_CONTAINER_ID));{code}
 


> Specify an unique DFSClient name for Hadoop FileSystem
> ------------------------------------------------------
>
>                 Key: FLINK-14340
>                 URL: https://issues.apache.org/jira/browse/FLINK-14340
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystems
>            Reporter: Congxian Qiu(klion26)
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Currently, when Flink read/write to HDFS, we do not set the DFSClient name for all the
connections, so we can’t distinguish the connections, and can’t find the specific Job
or TM quickly.
> This issue wants to add the {{container_id}} as a unique name when init Hadoop File
System, so we can easily distinguish the connections belongs to which Job/TM.
>  
> Core changes is add a line such as below in {{org.apache.flink.runtime.fs.hdfs.HadoopFsFactory#create}}
>  
> {code:java}
> hadoopConfig.set(“mapreduce.task.attempt.id”, System.getenv().getOrDefault(CONTAINER_KEY_IN_ENV,
DEFAULT_CONTAINER_ID));{code}
>  
> Currently, In {{YarnResourceManager}} and {{MesosResourceManager}} we both have an enviroment
key {{ENV_FLINK_CONTAINER_ID = "_FLINK_CONTAINER_ID"}}, so maybe we should introduce this
key in {{StandaloneResourceManager}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message