From GitBox <...@apache.org>
Subject [GitHub] [flink] zhijiangW opened a new pull request #9905: [FLINK-14396][network] Implement rudimentary non-blocking network output
Date Tue, 15 Oct 2019 12:02:15 GMT
zhijiangW opened a new pull request #9905: [FLINK-14396][network] Implement rudimentary non-blocking
network output
URL: https://github.com/apache/flink/pull/9905
   ## What is the purpose of the change
   Considering the mailbox model and unaligned checkpoints requirements in future, task network
output should be non-blocking. In other words, as long as output is available, it should never
block for a subsequent/future single record write.
   In the first version, we only implement the non-blocking output for the most regular case,
and do not solve the following cases which still keep the previous behavior.
       1. Big record which might span multiple buffers
       2. Flatmap-like operators which might emit multiple records in every process
       3. Broadcast watermark which might request multiple buffers at a time
   The solution is providing the `RecordWriter#isAvailable` method and respective `LocalBufferPool#isAvailable`
for judging the output beforehand. As long as there is at-least one available buffer in `LocalBufferPool`,
the `RecordWriter` is available for network output in most cases.  This doesn’t include
runtime handling of this non-blocking and availability behavior in `StreamInputProcessor`.
   Note: It requires the minimum number of buffers in output `LocalBufferPool` adjusting to
(numberOfSubpartitions + 1) and also adjusting the monitor of backpressure future.
   ## Brief change log
     - Refactor to extract `AbstractAvailabilityProvider` class from `InputGate` logics.
     - Provide the #isAvailable method for `RecordWriter`, `ResultPartitionWriter` and `BufferPool`.
     - Refactor the relevant implementations in `LocalBufferPool` for supporting updating
the available state.
   ## Verifying this change
   Add new `RecordWriterTest#testIsAvailableOrNot` for verifying the changes.
   ## Does this pull request potentially affect one of the following parts:
     - Dependencies (does it add or upgrade a dependency): (yes / `no`)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes
/ `no`)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing,
Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   ## Documentation
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not

