flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Zagrebin (Jira)" <j...@apache.org>
Subject [jira] [Closed] (FLINK-15300) Shuffle memory fraction sanity check does not account for its min/max limit
Date Wed, 08 Jan 2020 09:04:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrey Zagrebin closed FLINK-15300.
    Resolution: Fixed

merged into master by d22fdc39a86496ebfc74914a72916d8a0ea7ab89
merged into 1.10 by a342e418a2d8df52645dd75588f8b9f74a07ad63

> Shuffle memory fraction sanity check does not account for its min/max limit
> ---------------------------------------------------------------------------
>                 Key: FLINK-15300
>                 URL: https://issues.apache.org/jira/browse/FLINK-15300
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Configuration
>            Reporter: Andrey Zagrebin
>            Assignee: Andrey Zagrebin
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.10.0
>          Time Spent: 20m
>  Remaining Estimate: 0h
> If we have a configuration which results in setting shuffle memory size to its min or
max, not fraction during TM startup then starting TM parses generated dynamic properties and
while doing the sanity check (TaskExecutorResourceUtils#sanityCheckShuffleMemory) it fails
because it checks the exact fraction for min/max value.
> Example, start TM with the following Flink config:
> {code:java}
> taskmanager.memory.total-flink.size: 350m
> taskmanager.memory.framework.heap.size: 16m
> taskmanager.memory.shuffle.fraction: 0.1{code}
> The calculation will happen for total Flink memory and will result in the following extra
program args:
> {code:java}
> taskmanager.memory.shuffle.max: 67108864b
> taskmanager.memory.framework.off-heap.size: 134217728b
> taskmanager.memory.managed.size: 146800642b
> taskmanager.cpu.cores: 1.0
> taskmanager.memory.task.heap.size: 2097150b
> taskmanager.memory.task.off-heap.size: 0b
> taskmanager.memory.shuffle.min: 67108864b{code}
> where the derived fraction is less than shuffle memory min size (64mb), so it was set
to the min value: 64mb.
> While TM starts, the calculation happens now for the explicit task heap and managed
memory but also with the explicit total Flink memory and TaskExecutorResourceUtils#sanityCheckShuffleMemory
throws the following exception:
> {code:java}
> org.apache.flink.configuration.IllegalConfigurationException:
> Derived Shuffle Memory size(64 Mb (67108864 bytes)) does not match configured Shuffle
Memory fraction (0.10000000149011612).
> at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.sanityCheckShuffleMemory(TaskExecutorResourceUtils.java:552)
> at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithExplicitTaskAndManagedMemory(TaskExecutorResourceUtils.java:183)
> at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:135)
> {code}
> This can be fixed by checking whether the fraction to assert is within the min/max range.

This message was sent by Atlassian Jira

View raw message