flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Piotr Nowojski (Jira)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-14952) Yarn containers can exceed physical memory limits when using BoundedBlockingSubpartition.
Date Tue, 26 Nov 2019 13:44:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Piotr Nowojski updated FLINK-14952:
-----------------------------------
    Description: 
As [reported by a user on the user mailing list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoGroup-SortMerger-performance-degradation-from-1-6-4-1-9-1-td31082.html],
combination of using {{BoundedBlockingSubpartition}} with yarn containers can cause yarn container
to exceed memory limits.
{quote}2019-11-19 12:49:23,068 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor
connection container_e42_1574076744505_9444_01_000004 because: Container [pid=42774,containerID=container_e42_1574076744505_9444_01_000004]
is running beyond physical memory limits. Current usage: 12.0 GB of 12 GB physical memory
used; 13.9 GB of 25.2 GB virtual memory used. Killing container.
{quote}
This is probably happening because memory usage of mmap is not capped and not accounted by
configured memory limits, however yarn is tracking this memory usage and once Flink exceeds
some threshold, container is being killed.

Workaround is to overrule default value and force Flink to not user mmap, by setting a secret
(🤫) config option:
{noformat}
taskmanager.network.bounded-blocking-subpartition-type: file
{noformat}

  was:
As [reported by a user on the user mailing list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoGroup-SortMerger-performance-degradation-from-1-6-4-1-9-1-td31082.html],
combination of using {{BoundedBlockingSubpartition}} with yarn containers can cause yarn container
to exceed memory limits. 

{noformat}
2019-11-19 12:49:23,068 INFO  org.apache.flink.yarn.YarnResourceManager                  
  - Closing TaskExecutor connection container_e42_1574076744505_9444_01_000004 because: Container
[pid=42774,containerID=container_e42_1574076744505_9444_01_000004] is running beyond physical
memory limits. Current usage: 12.0 GB of 12 GB physical memory used; 13.9 GB of 25.2 GB virtual
memory used. Killing container.
{noformat}
This is probably happening because memory usage of mmap is not capped and not accounted by
configured memory limits, however yarn is tracking this memory usage and once Flink exceeds
some threshold, container is being killed.


> Yarn containers can exceed physical memory limits when using BoundedBlockingSubpartition.
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-14952
>                 URL: https://issues.apache.org/jira/browse/FLINK-14952
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / Network
>    Affects Versions: 1.9.1
>            Reporter: Piotr Nowojski
>            Priority: Blocker
>             Fix For: 1.10.0
>
>
> As [reported by a user on the user mailing list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoGroup-SortMerger-performance-degradation-from-1-6-4-1-9-1-td31082.html],
combination of using {{BoundedBlockingSubpartition}} with yarn containers can cause yarn container
to exceed memory limits.
> {quote}2019-11-19 12:49:23,068 INFO org.apache.flink.yarn.YarnResourceManager - Closing
TaskExecutor connection container_e42_1574076744505_9444_01_000004 because: Container [pid=42774,containerID=container_e42_1574076744505_9444_01_000004]
is running beyond physical memory limits. Current usage: 12.0 GB of 12 GB physical memory
used; 13.9 GB of 25.2 GB virtual memory used. Killing container.
> {quote}
> This is probably happening because memory usage of mmap is not capped and not accounted
by configured memory limits, however yarn is tracking this memory usage and once Flink exceeds
some threshold, container is being killed.
> Workaround is to overrule default value and force Flink to not user mmap, by setting
a secret (🤫) config option:
> {noformat}
> taskmanager.network.bounded-blocking-subpartition-type: file
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message