flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingjie Cao (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-14872) Potential deadlock for task reading from blocking ResultPartition.
Date Thu, 21 Nov 2019 11:52:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979191#comment-16979191

Yingjie Cao commented on FLINK-14872:

[~pnowojski] They are relevant and the causes are similar though not exactly the same. Even
with the timeout fix, there is still deadlock problem.

The issue reported in FLINK-12852 is because the down stream relies the upstream to release
Buffer, but the upstream relies downstream to consume data before recycling Buffers.

The issue reported in this Jira is because the ResultPartition relies the InputGate to release
Buffer, but the InputGate relies the data to be processed and emitted to ResultPartition.

> Potential deadlock for task reading from blocking ResultPartition.
> ------------------------------------------------------------------
>                 Key: FLINK-14872
>                 URL: https://issues.apache.org/jira/browse/FLINK-14872
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Yingjie Cao
>            Priority: Blocker
>             Fix For: 1.10.0
> Currently, the buffer pool size of InputGate reading from blocking ResultPartition is
unbounded which have a potential of using too many buffers and may lead to ResultPartition
of the same task can not acquire enough core buffers and finally lead to deadlock.
> Considers the following case:
> Core buffers are reserved for InputGate and ResultPartition -> InputGate consumes
lots of Buffer (not including the buffer reserved for ResultPartition) -> Other tasks acquire
exclusive buffer for InputGate and trigger redistribute of Buffers (Buffers taken by previous
InputGate can not be released) -> The first task of which InputGate uses lots of buffers
begin to emit records but can not acquire enough core Buffers (Some operators may not emit
records out immediately or there is just nothing to emit) -> Deadlock.
> I think we can fix this problem by limit the number of Buffers can be allocated by a
InputGate which reads from blocking ResultPartition.

This message was sent by Atlassian Jira

View raw message