ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Lapin (Jira)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-13374) Initial PME hangs because of multiple blinking nodes
Date Wed, 19 Aug 2020 07:41:00 GMT
Alexander Lapin created IGNITE-13374:

             Summary: Initial PME hangs because of multiple blinking nodes
                 Key: IGNITE-13374
                 URL: https://issues.apache.org/jira/browse/IGNITE-13374
             Project: Ignite
          Issue Type: Bug
            Reporter: Alexander Lapin
            Assignee: Alexander Lapin
             Fix For: 2.10

*Root cause* of the issue is a race inside GridDhtPartitionsExchangeFuture on client side
between two processes:
 # When old coordinator fails and the new one takes over it sends GridDhtPartitionsSingleRequest
messages to all nodes including clients to restore exchange results. Processing this message
on client includes updating current coordinator reference (crd field).

 # When future receives discovery notification about old coordinator failure it should detect
change of coordinator and send GridDhtPartitionsSingleMessage to new coordinator to obtain
affinity. But updated crd field prevents client from detecting coordinator failure and sending
SingleMessage to new coordinator which in turn leads to hanging client.

This message was sent by Atlassian Jira

View raw message