flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink] zhijiangW commented on issue #7186: [FLINK-10941] Keep slots which contain unconsumed result partitions
Date Thu, 07 Mar 2019 13:56:53 GMT
zhijiangW commented on issue #7186: [FLINK-10941] Keep slots which contain unconsumed result
partitions
URL: https://github.com/apache/flink/pull/7186#issuecomment-470534536
 
 
   I think this PR contains two separate improvements:
   
   1. When to release partition on TM side. I explained this issue above and it might has
three levels. The first is based on transport finished on producer side. The second is based
on consumption finished on consumer side.  The third is based on other mechanisms such as
one partition would be consumed multiple times. The current way is the first level based on
transport finished, and this PR improves it to the second level based on consumption finished.
I acknowledges the second level has some benefits in some scenarios such as producer does
not need to be restarted to re-produce the data if consumer fails during processing the data.
But the preconditions are the partition data saved in persistent file and `FailoverStrategy`
makes use of this feature. So I think it is not the proper time for improving it now or in
this PR. Furthermore, it could bring release TM resource delay if consumer is slow processing
the data.
   
   2. When to release TM on RM side? The current release of TM does not consider the partition
status resulting in necessary failure. I think this issue is the motivation of this PR. Then
we can simple fix it as current improved way. When the partition finishes transporting in
the network, it would be released from TM. When RM tries to release TM, it would inquire TM
whether it can be released or not, then TM loops all the internal partitions to check the
released status to return the decision. This process in the PR can solve the current problem
if we retain the previous logic of releasing partition. But I have a bit worries about the
form of current implementation. If I understand correctly, RM should decide whether to release
TM based on all the internal slot status. This is the basic rule. A more proper way might
partition notifies finished/released status like task notifies finished state to JM, then
RM decides when/whether to release it or not. If we introduce the interface of `TM#canBeReleased`
now, it seems that TM decides whether to release itself or not. Based on ShuffleMaster/ShuffleService
architecture, the partition state notification and query seem more naturally. But currently
the way of `TM#canBeReleased` seems more easy to focus and work. 
   
   Maybe @azagrebin and @pnowojski could give the final decision, and I do not mind refactoring
the related process future if this PR could be merged early. :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message