flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ufuk Celebi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1093) Add non-multiplexing NetworkConnectionManager
Date Sun, 07 Sep 2014 10:49:28 GMT
Ufuk Celebi created FLINK-1093:

             Summary: Add non-multiplexing NetworkConnectionManager
                 Key: FLINK-1093
                 URL: https://issues.apache.org/jira/browse/FLINK-1093
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Runtime
    Affects Versions: 0.7-incubating
            Reporter: Ufuk Celebi
            Priority: Minor

When translating a JobGraph to an ExecutionGraph, edges are assigned connection indexes to
ensure deadlock freedom of the network stack. Deadlocks can easily occur when multiplexing
multiple logical channels over a single TCP connection, because of the limited network buffers.
Each incoming envelope, which has an attached buffer, needs to request a network buffer from
the receiver's input buffer pool. If the receiver does not have a buffer available, it unsubscribes
from the TCP connection until a new buffer is available. This unsubscribe then affects all
other logical channels, which are multiplexed over the same TCP connection. With certain user
programs this can result in a deadlock.

One of the problems when running in multiplexed mode is the question of when to close TCP
connections in a reliable way without loosing or re-ordering network envelopes (as evident
in FLINK-1063). In non-multiplexed mode this is trivial, because every logical channel is
mapped to a phyiscal channel and the close of the logical channel translates to a close of
the physical channel.

However, this approach results in limited scalability as high degrees of parallelism result
in a high number of TCP connections per machine, which are continuously opened and closed.
Still, it is useful as a baseline and should work fine with smaller setups.

This message was sent by Atlassian JIRA

View raw message