geode-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Schuchardt <>
Subject Re: Member isn't responding to heartbeat requests
Date Mon, 25 Feb 2019 15:56:30 GMT
In a distributed system nodes (servers, locators) are continually 
watching other nodes to ensure that something bad hasn't happened.  One 
of the ways this is done in Geode is for each node to watch one other 
node and expect periodic signs that it's still alive.  This is done 
through TCP messaging.  Any message from the node being watched counts 
as proof that it's still alive.  If no messages are seen within the 
"member-timeout" period (see Distributed System settings, default 
5000ms) then a "heartbeat" is requested over UDP.  If no message is 
received in another "member-timeout" interval we attempt to directly 
contact the suspect with a tcp/ip connection requesting that it verify 
its identity.  If this fails the suspect is kicked out of the cluster.

So, you could increase your member-timeout setting or maybe investigate 
why messages, especially hearbeats, aren't being received.  A tcp/ip 
performance measuring tool might help in that regard - run one to see 
what the packet-loss percentage is and if it's high look into why that's 

It's also possible that garbage-collection is kicking in on the member 
that "isn't responding to heartbeat requests" or that it's not getting 
enough CPU for other reasons.

On 2/25/19 2:39 AM, Avital Amity wrote:
> Hi,
> I have an environment where I servers and locator go down from time to 
> time with the below error:
> Member isn't responding to heartbeat requests
> Any suggestion regarding relevant configuration/other thing to check? 
> What can lead to this issue?
> Thanks
> Avital
> *This email and the information contained herein is proprietary and 
> confidential and subject to the Amdocs Email Terms of Service, which 
> you may review at***

View raw message