[ https://issues.apache.org/jira/browse/FLINK-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541389#comment-16541389
]
ASF GitHub Bot commented on FLINK-9755:
---------------------------------------
Github user NicoK commented on a diff in the pull request:
https://github.com/apache/flink/pull/6272#discussion_r201971896
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
---
@@ -360,32 +360,44 @@ public boolean notifyBufferAvailable(Buffer buffer) {
return false;
}
- boolean needMoreBuffers = false;
- synchronized (bufferQueue) {
- checkState(isWaitingForFloatingBuffers, "This channel should be waiting for floating
buffers.");
+ boolean recycleBuffer = true;
+ try {
+ boolean needMoreBuffers = false;
+ synchronized (bufferQueue) {
+ checkState(isWaitingForFloatingBuffers,
+ "This channel should be waiting for floating buffers.");
+
+ // Important: double check the isReleased state inside synchronized block, so there
is no
+ // race condition when notifyBufferAvailable and releaseAllResources running in parallel.
+ if (isReleased.get() ||
+ bufferQueue.getAvailableBufferSize() >= numRequiredBuffers) {
+ isWaitingForFloatingBuffers = false;
+ buffer.recycleBuffer();
+ return false;
+ }
- // Important: double check the isReleased state inside synchronized block, so there
is no
- // race condition when notifyBufferAvailable and releaseAllResources running in parallel.
- if (isReleased.get() || bufferQueue.getAvailableBufferSize() >= numRequiredBuffers)
{
- isWaitingForFloatingBuffers = false;
- buffer.recycleBuffer();
- return false;
- }
+ recycleBuffer = false;
+ bufferQueue.addFloatingBuffer(buffer);
- bufferQueue.addFloatingBuffer(buffer);
+ if (bufferQueue.getAvailableBufferSize() == numRequiredBuffers) {
+ isWaitingForFloatingBuffers = false;
+ } else {
+ needMoreBuffers = true;
+ }
- if (bufferQueue.getAvailableBufferSize() == numRequiredBuffers) {
- isWaitingForFloatingBuffers = false;
- } else {
- needMoreBuffers = true;
+ if (unannouncedCredit.getAndAdd(1) == 0) {
+ notifyCreditAvailable();
+ }
}
- }
- if (unannouncedCredit.getAndAdd(1) == 0) {
- notifyCreditAvailable();
+ return needMoreBuffers;
+ } catch (Throwable t) {
+ if (recycleBuffer) {
+ buffer.recycleBuffer();
+ }
+ setError(t);
+ return false;
--- End diff --
The protection (currently) is against the `checkState` here and in `notifyCreditAvailable()`
(the latter also has some external calls).
We could return `needMoreBuffers` here but I actually see this exception as if it happened
in the responsible thread and therefore not continue to use this listener.
> Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible
thread
> -----------------------------------------------------------------------------------------------------
>
> Key: FLINK-9755
> URL: https://issues.apache.org/jira/browse/FLINK-9755
> Project: Flink
> Issue Type: Bug
> Components: Network
> Affects Versions: 1.5.0
> Reporter: Nico Kruber
> Assignee: Nico Kruber
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.5.2, 1.6.0
>
>
> The credit-based flow control implementation of RemoteInputChannel#notifyBufferAvailable()
does not forward errors (like the {{IllegalStateException}}) to the thread that is being notified.
The calling code at {{LocalBufferPool#recycle}}, however, relies on the callback forwarding
errors and completely ignores any failures.
> Therefore, we could end up with a program waiting forever for the callback and not even
a failure message in the logs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
|