gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GEARPUMP-8) Two machines can possibly have same worker Id when master restart in single-master cluster
Date Sat, 02 Apr 2016 06:30:25 GMT
Sean Zhong created GEARPUMP-8:
---------------------------------

             Summary: Two machines can possibly have same worker Id when master restart in
single-master cluster
                 Key: GEARPUMP-8
                 URL: https://issues.apache.org/jira/browse/GEARPUMP-8
             Project: Apache Gearpump
          Issue Type: Bug
            Reporter: Sean Zhong



*Why we should NOT allow duplicate worker id?*
We use worker Id to track the resource of single machine. If two machines have same worker
id, then it would create a lot of confusion.

*Pre-condition to trigger this issue?*
This happens when the cluster only has one master, and the master is doing restart. 
If the cluster have multiple masters, then it is not impacted by this issue.

*How this issue happens?*
When master is going through restart, since there is no other master machines for HA,  the
master status is lost, including the worker id list that has been occupied by existing workers.
Then when a new worker machine joins, it would get a fresh worker Id starting from 0, which
could possibly conflict with existing worker machines.

*Suggested fix?*
Instead of using sequence 0, 1, 2, 3, 4... for worker id, we append a timestamp, which is
the time that worker register itself to master.

Like this:
{quote}
WorkerId(0, timestamp1)
WorkerId(1, timestamp2)
...
{quote}

Then when master is restarted, the new worker and old worker can be differentiated by the
timestamp, as the time of registration is different. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message