flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhijiang (Jira)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-14472) Implement back-pressure monitor with non-blocking outputs
Date Tue, 22 Oct 2019 06:52:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-14472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhijiang updated FLINK-14472:
-----------------------------
    Description: 
Currently back-pressure monitor relies on detecting task threads that are stuck in `requestBufferBuilderBlocking`.
There are actually two cases to cause back-pressure ATM:
 * There are no available buffers in `LocalBufferPool` and all the given quotas from global
pool are also exhausted. Then we need to wait for buffer recycling to `LocalBufferPool`.
 * No available buffers in `LocalBufferPool`, but the quota has not been used up. While requesting
buffer from global pool, it is blocked because of no available buffers in global pool. Then
we need to wait for buffer recycling to global pool.

We try to implement the non-blocking network output in FLINK-14396, so the back pressure
monitor should be adjusted accordingly after the non-blocking output is used in practice.

In detail we try to avoid the current monitor way by analyzing the task thread stack, which
has some drawbacks discussed before:
 * If the `requestBuffer` is not triggered by task thread, the current monitor is invalid
in practice.
 * The current monitor is heavy-weight and fragile because it needs to understand more details
of LocalBufferPool implementation.  

We could provide a transparent method for the monitor caller to get the backpressure result
directly, and hide the implementation details in the LocalBufferPool.

  was:
Currently back-pressure monitor relies on detecting task threads that are stuck in `requestBufferBuilderBlocking`.
There are actually two cases to cause back-pressure ATM:
 * There are no available buffers in `LocalBufferPool` and all the given quotas from global
pool are also exhausted. Then we need to wait for buffer recycling to `LocalBufferPool`.
 * No available buffers in `LocalBufferPool`, but the quota has not been used up. While requesting
buffer from global pool, it is blocked because of no available buffers in global pool. Then
we need to wait for buffer recycling to global pool.

We already implemented the non-blocking output for the first case in [FLINK-14396|https://issues.apache.org/jira/browse/FLINK-14396], and
we expect the second case done together with adjusting the back-pressure monitor which could
check for `RecordWriter#isAvailable` instead.


> Implement back-pressure monitor with non-blocking outputs
> ---------------------------------------------------------
>
>                 Key: FLINK-14472
>                 URL: https://issues.apache.org/jira/browse/FLINK-14472
>             Project: Flink
>          Issue Type: Task
>          Components: Runtime / Network
>            Reporter: zhijiang
>            Assignee: Yingjie Cao
>            Priority: Minor
>             Fix For: 1.10.0
>
>
> Currently back-pressure monitor relies on detecting task threads that are stuck in `requestBufferBuilderBlocking`.
There are actually two cases to cause back-pressure ATM:
>  * There are no available buffers in `LocalBufferPool` and all the given quotas from
global pool are also exhausted. Then we need to wait for buffer recycling to `LocalBufferPool`.
>  * No available buffers in `LocalBufferPool`, but the quota has not been used up. While
requesting buffer from global pool, it is blocked because of no available buffers in global
pool. Then we need to wait for buffer recycling to global pool.
> We try to implement the non-blocking network output in FLINK-14396, so the back pressure
monitor should be adjusted accordingly after the non-blocking output is used in practice.
> In detail we try to avoid the current monitor way by analyzing the task thread stack,
which has some drawbacks discussed before:
>  * If the `requestBuffer` is not triggered by task thread, the current monitor is invalid
in practice.
>  * The current monitor is heavy-weight and fragile because it needs to understand more
details of LocalBufferPool implementation.  
> We could provide a transparent method for the monitor caller to get the backpressure
result directly, and hide the implementation details in the LocalBufferPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message