From commits-return-5766-apmail-helix-commits-archive=helix.apache.org@helix.apache.org Thu May 26 04:32:41 2016 Return-Path: X-Original-To: apmail-helix-commits-archive@minotaur.apache.org Delivered-To: apmail-helix-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8E3A519188 for ; Thu, 26 May 2016 04:32:41 +0000 (UTC) Received: (qmail 45484 invoked by uid 500); 26 May 2016 04:32:41 -0000 Delivered-To: apmail-helix-commits-archive@helix.apache.org Received: (qmail 45407 invoked by uid 500); 26 May 2016 04:32:40 -0000 Mailing-List: contact commits-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@helix.apache.org Delivered-To: mailing list commits@helix.apache.org Received: (qmail 45377 invoked by uid 99); 26 May 2016 04:32:40 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2016 04:32:40 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 13910DFC6D; Thu, 26 May 2016 04:32:40 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: kishoreg@apache.org To: commits@helix.apache.org Date: Thu, 26 May 2016 04:32:40 -0000 Message-Id: <9aa325bf78d84801a6f73291893cfc3b@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [1/2] helix git commit: fixed a bug at WriteLock caused by read-delete race on a znode. Repository: helix Updated Branches: refs/heads/master 196675cd9 -> f011ea3ee fixed a bug at WriteLock caused by read-delete race on a znode. Bug description: T1 currently owns a zk lock as signified by znode n1, T2 creates a znode n2 and realizes n1 is saller. T2 is going to register a watcher on n1 but at the same moment T1 released n1. T2 register fails, breaks from while loop, and wait(). Nobody will ever wake up T2 again. Consequently all subsequent callers for the same lock are also blocked. Test: Repeated our loadtest and the bug doesn't reappear. For detailed bug report see this post: http://mail-archives.apache.org/mod_mbox/helix-dev/201605.mbox/%3CCAB-bdySG8Uf6c1fyVHpSu-5pD99VHE=mrL=j3QNkaTWaEtKQ+w@mail.gmail.com%3E Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/6ecac13e Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/6ecac13e Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/6ecac13e Branch: refs/heads/master Commit: 6ecac13e42c52f854450c98e33d2e2624d0f6167 Parents: 94e1079 Author: neutronsharc Authored: Thu May 19 15:29:56 2016 -0700 Committer: neutronsharc Committed: Thu May 19 15:40:01 2016 -0700 ---------------------------------------------------------------------- .../src/main/java/org/apache/helix/lock/zk/WriteLock.java | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/helix/blob/6ecac13e/helix-core/src/main/java/org/apache/helix/lock/zk/WriteLock.java ---------------------------------------------------------------------- diff --git a/helix-core/src/main/java/org/apache/helix/lock/zk/WriteLock.java b/helix-core/src/main/java/org/apache/helix/lock/zk/WriteLock.java index aef7618..b842ff8 100644 --- a/helix-core/src/main/java/org/apache/helix/lock/zk/WriteLock.java +++ b/helix-core/src/main/java/org/apache/helix/lock/zk/WriteLock.java @@ -179,7 +179,7 @@ class WriteLock extends ProtocolSupport { List names = zookeeper.getChildren(dir, false); for (String name : names) { if (name.startsWith(prefix)) { - id = name; + id = dir + "/" + name; if (LOG.isDebugEnabled()) { LOG.debug("Found id created last time: " + id); } @@ -230,14 +230,15 @@ class WriteLock extends ProtocolSupport { ZNodeName lastChildName = lessThanMe.last(); lastChildId = lastChildName.getName(); if (LOG.isDebugEnabled()) { - LOG.debug("watching less than me node: " + lastChildId); + LOG.debug("watching less than me node: " + lastChildId + ", my id: " + idName.getName()); } Stat stat = zookeeper.exists(lastChildId, new LockWatcher()); if (stat != null) { return Boolean.FALSE; } else { LOG.warn("Could not find the" + " stats for less than me: " - + lastChildName.getName()); + + lastChildName.getName() + ", will retry"); + id = null; } } else { if (isOwner()) {