hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mollitor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-21469) Review of ZooKeeperHiveLockManager
Date Tue, 19 Mar 2019 02:09:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Mollitor updated HIVE-21469:
----------------------------------
    Attachment: HIVE-21469.1.patch

> Review of ZooKeeperHiveLockManager
> ----------------------------------
>
>                 Key: HIVE-21469
>                 URL: https://issues.apache.org/jira/browse/HIVE-21469
>             Project: Hive
>          Issue Type: Improvement
>          Components: Locking
>    Affects Versions: 4.0.0, 3.2.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HIVE-21469.1.patch
>
>
> A lot of sins in this class to resolve:
> {code:java}
>   @Override
>   public void setContext(HiveLockManagerCtx ctx) throws LockException {
>  try {
>       curatorFramework = CuratorFrameworkSingleton.getInstance(conf);
>       parent = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_NAMESPACE);
>       try{
>         curatorFramework.create().withMode(CreateMode.PERSISTENT).forPath("/" +  parent,
new byte[0]);
>       } catch (Exception e) {
>         // ignore if the parent already exists
>         if (!(e instanceof KeeperException) || ((KeeperException)e).code() != KeeperException.Code.NODEEXISTS)
{
>           LOG.warn("Unexpected ZK exception when creating parent node /" + parent, e);
>         }
>       }
> {code}
> Every time a new session is created and this {{setContext}} method is called, it attempts
to create the root node.  I have seen that, even though the root node exists, an create node
action is written to the ZK logs.  Check first if the node exists before trying to create
it.
> {code:java}
>       try {
>         curatorFramework.delete().forPath(zLock.getPath());
>       } catch (InterruptedException ie) {
>         curatorFramework.delete().forPath(zLock.getPath());
>       }
> {code}
> There has historically been a quite a few bugs regarding leaked locks.  The Driver will
signal the session {{Thread}} by performing an interrupt.  That interrupt can happen any time
and it can kill a create/delete action within the ZK framework.  We can see one example of
workaround for this.  If the ZK action is interrupted, simply do it again.  Well, what if
it's interrupted yet again?  The lock will be leaked anyway.  Also, when the {{InterruptedException}}
is caught in the try block, the thread's interrupted flag is cleared.  The flag is not reset
in this code and therefore we lose the fact that this thread has been interrupted.
> {code:java}
>         if (tryNum > 1) {
>           Thread.sleep(sleepTime);
>         }
>         unlockPrimitive(hiveLock, parent, curatorFramework);
>         break;
>       } catch (Exception e) {
>         if (tryNum >= numRetriesForUnLock) {
>           String name = ((ZooKeeperHiveLock)hiveLock).getPath();
>           throw new LockException("Node " + name + " can not be deleted after " + numRetriesForUnLock
+ " attempts.",
>               e);
>         }
>       }
> {code}
> ... related... the sleep here may be interrupted, but we still need to delete the lock
(again, for fear of leaking it).  This sleep should be uninterruptible.  If we need to get
the lock deleted, and there's a problem, interrupting the sleep will cause the code to eventually
exit and locks will be leaked.
> It also requires a bunch more TLC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message