flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nico Kruber (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-9636) Network buffer leaks in requesting a batch of segments during canceling
Date Mon, 02 Jul 2018 10:00:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529613#comment-16529613
] 

Nico Kruber edited comment on FLINK-9636 at 7/2/18 9:59 AM:
------------------------------------------------------------

Actually, {{numRequiredBuffers}} is only a local variable in this method - why should we bother
changing it?

Also, if there is an {{InterruptedException}} when polling memory segments from the {{availableMemorySegments}}
queue, this will be re-thrown and the request will fail - {{NetworkBufferPool}} should then
be restored to the state it was before which it is, isn't it?

I see only one point where the accounting for {{numTotalRequiredBuffers}} can be wrong: if
an exception is thrown in the first of the {{redistributeBuffers()}} calls. Tracing it further
down, this can only happen if {{SpillableSubpartition#releaseMemory()}} throws, e.g. due to
a failure in creating a {{spillWriter}}. I'm working on a patch...


was (Author: nicok):
Actually, {{numRequiredBuffers}} is only a local variable in this method - why should we bother
changing it?

Also, if there is an {{InterruptedException}} when polling memory segments from the {{availableMemorySegments}}
queue, this will be re-thrown and the request will fail - {{NetworkBufferPool}} should then
be restored to the state it was before which it is, isn't it?

I see only one point where the accounting for {{numTotalRequiredBuffers}} can be wrong: if
an exception is thrown in the first of the {{redistributeBuffers()}} calls.

> Network buffer leaks in requesting a batch of segments during canceling
> -----------------------------------------------------------------------
>
>                 Key: FLINK-9636
>                 URL: https://issues.apache.org/jira/browse/FLINK-9636
>             Project: Flink
>          Issue Type: Bug
>          Components: Network
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: zhijiang
>            Priority: Major
>             Fix For: 1.5.1
>
>
> In {{NetworkBufferPool#requestMemorySegments}}, {{numTotalRequiredBuffers}} is increased
by {{numRequiredBuffers}} first.
> If {{InterruptedException}} is thrown during polling segments from the available queue,
the requested segments will be recycled back to {{NetworkBufferPool}}, {{numTotalRequiredBuffers}}
is decreased by the number of polled segments which is now inconsistent with {{numRequiredBuffers}}.
So {{numTotalRequiredBuffers}} in {{NetworkBufferPool}} leaks in this case, and we can also
decrease {{numRequiredBuffers}} to fix this bug.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message