flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yun Gao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-12171) The network buffer memory size should not be checked against the heap size on the TM side
Date Tue, 28 May 2019 03:35:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849304#comment-16849304

Yun Gao commented on FLINK-12171:

After further analyze this problem, now I think we do not need to check the maximum allowed
memory on TM side.

For RM side, we compute the network memory size from the total memory size, there may be cases
that the configured MIN and MAX is too large that the resulted network memory is larger than
the total memory size, we need to check against that.

However, on TM side, we do not know the total memory size, instead we only know the heap size.
We can only deduce the total memory size by heap size + computed network memory, which is
always larger than the computed network memory. 

Therefore, unless we ensure the total memory size is available on the TM side and we also
compute the network memory size from the total memory size on TM side, we can not check the
network memory size.

According to the above analysis, I think we can first remove the comparison of the network
memory size and heap memory size directly. This comparison is not right since the network
memory is not part of the heap memory, and it may raise error when the configuration is in
fact reasonable. 



> The network buffer memory size should not be checked against the heap size on the TM
> -----------------------------------------------------------------------------------------
>                 Key: FLINK-12171
>                 URL: https://issues.apache.org/jira/browse/FLINK-12171
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.7.2, 1.8.0
>         Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the logic here.
>            Reporter: Yun Gao
>            Assignee: Yun Gao
>            Priority: Major
> Currently when computing the network buffer memory size on the TM side in _TaskManagerService#calculateNetworkBufferMemory_`(version
1.8 or 1.7) or _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master),
the computed network buffer memory size is checked to be less than `maxJvmHeapMemory`. However,
in TM side, _maxJvmHeapMemory_ stores the maximum heap memory (namely -Xmx) .
> With the above process, when TM starts, -Xmx is computed in RM or in _taskmanager.sh_
with (container memory - network buffer memory - managed memory),  thus the above checking
implies that the heap memory of the TM must be larger than the network memory, which seems
to be not necessary.
> Therefore, I think the network buffer memory size also need to be checked against the
total memory instead of the heap memory on the TM  side:
>  # Checks that networkBufFraction < 1.0.
>  # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
>  # Compare the network buffer memory with the total memory.
> This checking is also consistent with the similar one done on the RM side.

This message was sent by Atlassian JIRA

View raw message