flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From beyond1920 <...@git.apache.org>
Subject [GitHub] flink issue #2410: [FLINK-4449] [cluster management] Heartbeat Manager betwe...
Date Thu, 25 Aug 2016 02:43:20 GMT
Github user beyond1920 commented on the issue:

    Hi, till. Thanks for reviewing and good advices so much.  I agree we should define how
should it look like first. And I try to give my opinions  for your question.
    1. exponential backoff strategy.  
    In fact, it is not complete exponential backoff. like 'Math.min(2 * timeoutMillis, maxHeartbeatTimeout)',
Maybe we could use maxHeartbeatTimeout to decrease the risk of wait twice as long as defined
until notified about a heartbeat failure.
    Also we could use constant retry period instead of backoff strategy
    2. whether every heartbeat connection should be responsible for triggering itself or whether
the heartbeat manager should be responsible for that?
    Every heartbeat scheduler don't trigger itself, it depends on outer world(Here i means
HeartbeatManager) call it's start method to trigger it.  
    3. Is the heartbeat receiving end an independent RpcEndpoint? How does the payload delivery
works? Does the sender side asks for the result (future) or does the receiving side answers
via a tell message to the heartbeat manager?
    On the sender side, receiving end is a gateway which can be got by its address. And Sender
side ask receiver for the heartbeat payload.
    4. How does receiving end monitor the sender so that if the heartbeat request is not delivered,
then receiving end could mark sending end as dead?
    I think it could be independent of heartbeat manager on the sending side. It should run
on the receiving end while heartbeat scheduler run on the sending side.
    What's your advice?

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message