flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunpinghe (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (FLINK-10884) Flink on yarn TM container will be killed by nodemanager because of the exceeded physical memory.
Date Fri, 08 Mar 2019 07:38:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

chunpinghe updated FLINK-10884:
    Comment: was deleted

(was: what's your solution?

yarn will check the physical memory used by container by default, you can disable it by set
{color:#6a8759}yarn.nodemanager.pmem-check-enabled {color:#333333}to false. in your example,
if your container use too much offheap memory(directory memory , or jni malloc) lead to total
memory exceeds 3g then the container will be killed anyhow.{color}

{color:#6a8759}{color:#333333}so, if your container was always killed by nodemanager you shoud
check if the total memory you provided for it is not sufficient or your code has memory leak
(mainly native memory leak){color}{color}



> Flink on yarn  TM container will be killed by nodemanager because of  the exceeded  physical
> ----------------------------------------------------------------------------------------------------
>                 Key: FLINK-10884
>                 URL: https://issues.apache.org/jira/browse/FLINK-10884
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / Coordination
>    Affects Versions: 1.5.5, 1.6.2, 1.7.0
>         Environment: version  : 1.6.2 
> module : flink on yarn
> centos  jdk1.8
> hadoop 2.7
>            Reporter: wgcn
>            Assignee: wgcn
>            Priority: Major
>              Labels: pull-request-available, yarn
> TM container will be killed by nodemanager because of  the exceeded  [physical|http://www.baidu.com/link?url=Y4LyfMDH59n9-Ey16Fo6EFAYltN1e9anB3y2ynhVmdvuIBCkJGdH0hTExKDZRvXNr6hqhwIXs8JjYqesYbx0BOpQDD0o1VjbVQlOC-9MgXi]
memory. I found the lanuch context   lanuching TM container  that  "container memory
=   heap memory+ offHeapSizeMB"  at the class org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters 
 from line 160 to 166  I set a safety margin for the whole memory container using. For example 
if the container  limit 3g  memory,  the sum memory that   "heap memory+ offHeapSizeMB" 
is equal to  2.4g to prevent the container being killed.Do we have the [ready-made|http://www.baidu.com/link?url=ylC8cEafGU6DWAdU9ADcJPNugkjbx6IjtqIIxJ9foX4_Yfgc7ctWmpEpQRettVmBiOy7Wfph7S1UvN5LiJj-G1Rsb--oDw4Z2OEbA5Fj0bC] solution 
or I can commit my solution

This message was sent by Atlassian JIRA

View raw message