flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10367) Avoid recursion stack overflow during releasing SingleInputGate
Date Tue, 27 Nov 2018 15:41:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700598#comment-16700598
] 

ASF GitHub Bot commented on FLINK-10367:
----------------------------------------

zhijiangW commented on issue #6829: [FLINK-10367][network] Introduce NotificationResult for
BufferListener to solve recursive stack overflow
URL: https://github.com/apache/flink/pull/6829#issuecomment-442104761
 
 
   Thanks for merging @pnowojski .

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Avoid recursion stack overflow during releasing SingleInputGate
> ---------------------------------------------------------------
>
>                 Key: FLINK-10367
>                 URL: https://issues.apache.org/jira/browse/FLINK-10367
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.6.3, 1.8.0
>
>
> For task failure or canceling, the {{SingleInputGate#releaseAllResources}} will be invoked
before task exits.
> In the process of {{SingleInputGate#releaseAllResources}}, we first loop to release all
the input channels, then destroy the {{BufferPool}}.  For {{RemoteInputChannel#releaseAllResources}},
it will return floating buffers to the {{BufferPool}} {{which assigns this recycled buffer
to the other listeners(RemoteInputChannel}}). 
> It may exist recursive call in this process. If the listener is already released before,
it will directly recycle this buffer to the {{BufferPool}} which takes another listener to
notify available buffer. The above process may be invoked repeatedly in recursive way.
> If there are many input channels as listeners in the {{BufferPool}}, it will cause {{StackOverflow}}
error because of recursion. And in our testing job, the scale of 10,000 input channels ever
caused this error.
> I think of two ways for solving this potential problem:
>  # When the input channel is released, it should notify the {{BufferPool}} of unregistering
this listener, otherwise it is inconsistent between them.
>  # {{SingleInputGate}} should destroy the {{BufferPool}} first, then loop to release
all the internal input channels. To do so, all the listeners in {{BufferPool}} will be removed
during destroying, and the input channel will not have further interactions during {{RemoteInputChannel#releaseAllResources}}.
> I prefer the second way to solve this problem, because we do not want to expand another
interface method for removing buffer listener, further currently the internal data structure
in {{BufferPool}} can not support remove a listener directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message