Joel Baranick created HELIX-628:
-----------------------------------
Summary: ZKHelixAdmin silently fails to fully cleanup the ZK structure
Key: HELIX-628
URL: https://issues.apache.org/jira/browse/HELIX-628
Project: Apache Helix
Issue Type: Bug
Affects Versions: 0.6.x
Reporter: Joel Baranick
For some reason, the ZKHelixAdmin silently fails to fully cleanup the ZK structure corresponding
to the Helix cluster instance even if it is configured to do the cleanup before everything
else starts up. This causes the Yarn application to fail to start.
{code:title=Shutdown|borderStyle=solid}
2016-02-17 06:25:01 UTC INFO [Thread-4] gobblin.yarn.GobblinYarnAppLauncher 301 - Stopping
the GobblinYarnAppLauncher
2016-02-17 06:25:01 UTC INFO [Thread-4] org.apache.helix.messaging.DefaultMessagingService
84 - Send 1 messages with criteria instanceName=%resourceName=%partitionName=%partitionState=%
2016-02-17 06:25:02 UTC INFO [LogCopier STOPPING] gobblin.util.ExecutorsUtils 125 - Attempting
to shutdown ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor@73240b61[Shutting
down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1862]
2016-02-17 06:25:02 UTC INFO [LogCopier STOPPING] gobblin.util.ExecutorsUtils 144 - Successfully
shutdown ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor@73240b61[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1862]
2016-02-17 06:25:02 UTC INFO [JobExecutionInfoServer STOPPING] gobblin.rest.JobExecutionInfoServer
94 - Stopping the job execution information server
Shutting down
2016-02-17 06:25:02 UTC INFO [AdminWebServer STOPPING] org.eclipse.jetty.server.AbstractConnector
306 - Stopped ServerConnector@35e0c350{HTTP/1.1}{localhost:8280}
2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 125 - Attempting to
shutdown ExecutorService: java.util.concurrent.Executors$DelegatedScheduledExecutorService@185aaf1f
2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 144 - Successfully shutdown
ExecutorService: java.util.concurrent.Executors$DelegatedScheduledExecutorService@185aaf1f
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.manager.zk.ZKHelixManager 546 -
disconnect ip-169-0-0-1(SPECTATOR) from GobblinYarn
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.messaging.handling.HelixTaskExecutor
679 - Shutting down HelixTaskExecutor
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.messaging.handling.HelixTaskExecutor
443 - Reset HelixTaskExecutor
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.messaging.handling.HelixTaskExecutor
453 - Reset exectuor for msgType: TASK_REPLY, pool: java.util.concurrent.ThreadPoolExecutor@3f197a46[Running,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.messaging.handling.HelixTaskExecutor
397 - Shutting down pool: java.util.concurrent.ThreadPoolExecutor@3f197a46[Running, pool
size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.messaging.handling.HelixTaskExecutor
684 - Shutdown HelixTaskExecutor finished
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.manager.zk.ZkClient 130 - Closing
zkclient: State:CONNECTED Timeout:30000 sessionid:0xd452eb397b640065 local:/169.0.0.1:51319
remoteserver:ip-138-0-0-1.ec2.internal/138.0.0.2:2181 lastZxid:60129782948 xid:17 sent:140
recv:140 queuedpkts:0 pendingresp:0 queuedevents:0
2016-02-17 06:25:02 UTC INFO [ZkClient-EventThread-17-zk.server:2181] org.I0Itec.zkclient.ZkEventThread
82 - Terminate ZkClient event thread.
2016-02-17 06:25:02 UTC INFO [main-EventThread] org.apache.zookeeper.ClientCnxn$EventThread
512 - EventThread shut down
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.zookeeper.ZooKeeper 684 - Session: 0xd452eb397b640065
closed
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.manager.zk.ZkClient 157 - Closed
zkclient
2016-02-17 06:25:02 UTC INFO [Thread-4] org.apache.helix.manager.zk.ZKHelixManager 570 -
Cluster manager: ip-169-0-0-1 disconnected
2016-02-17 06:25:02 UTC INFO [Thread-4] gobblin.yarn.GobblinYarnAppLauncher 722 - Deleting
application working directory hdfs://ec2-145-0-0-1.compute-1.amazonaws.com:9000/user/yarn/GobblinYarn/application_1455654714320_0004
{code}
{code:title=Startup|borderStyle=solid}
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr
/lib64:/lib64:/lib:/usr/lib
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:java.io.tmpdir=/tmp
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:java.compiler=<NA>
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:os.name=Linux
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:os.arch=amd64
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:os.version=3.19.0-49-generic
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:user.name=yarn
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:user.home=/home/yarn
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.Environment 100 - Client environment:user.dir=/opt/app/gobblin/00046-cfdc834
2016-02-17 06:51:32 UTC INFO [main] org.apache.zookeeper.ZooKeeper 438 - Initiating client
connection, connectString=zk.server:2181 sessionTimeout=30000 watcher=org.apache.helix.manager.zk.ZkClient@35e52059
2016-02-17 06:51:32 UTC INFO [main-SendThread(ip-169-0-0-1.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn$SendThread
975 - Opening socket connection to server ip-169-0-0-1.ec2.internal/169.0.0.1:2181. Will
not attempt to authenticate using SASL (unknown error)
2016-02-17 06:51:32 UTC INFO [main-SendThread(ip-169-0-0-1.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn$SendThread
852 - Socket connection established to ip-169-0-0-1.ec2.internal/169.0.0.1:2181, initiating
session
2016-02-17 06:51:32 UTC INFO [main-SendThread(ip-169-0-0-1.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn$SendThread
1235 - Session establishment complete on server ip-169-0-0-1.ec2.internal/169.0.0.1:2181,
sessionid = 0x5b52eb397b640080, negotiated timeout = 30000
2016-02-17 06:51:32 UTC INFO [main-EventThread] org.I0Itec.zkclient.ZkClient 449 - zookeeper
state changed (SyncConnected)
2016-02-17 06:51:32 UTC WARN [main] org.apache.helix.manager.zk.ZKHelixAdmin 495 - Root
directory exists.Cleaning the root directory:/GobblinYarn
Exception in thread "main" org.I0Itec.zkclient.exception.ZkException: org.apache.zookeeper.KeeperException$NotEmptyException:
KeeperErrorCode = Directory not empty for /GobblinYarn/CONTROLLER
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.apache.helix.manager.zk.ZkClient.delete(ZkClient.java:348)
at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:516)
at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:511)
at org.apache.helix.manager.zk.ZKHelixAdmin.addCluster(ZKHelixAdmin.java:496)
at org.apache.helix.tools.ClusterSetup.addCluster(ClusterSetup.java:154)
at gobblin.yarn.YarnHelixUtils.createGobblinYarnHelixCluster(YarnHelixUtils.java:67)
at gobblin.yarn.GobblinYarnAppLauncher.launch(GobblinYarnAppLauncher.java:243)
at gobblin.yarn.GobblinYarnAppLauncher.main(GobblinYarnAppLauncher.java:784)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory
not empty for /GobblinYarn/CONTROLLER
at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:91)
at org.apache.helix.manager.zk.ZkClient$8.call(ZkClient.java:352)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 8 more
2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.yarn.GobblinYarnAppLauncher 301 - Stopping
the GobblinYarnAppLauncher
2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 125 - Attempting to
shutdown ExecutorService: java.util.concurrent.Executors$DelegatedScheduledExecutorService@2c68b710
2016-02-17 06:51:33 UTC INFO [Thread-4] gobblin.util.ExecutorsUtils 144 - Successfully shutdown
ExecutorService: java.util.concurrent.Executors$DelegatedScheduledExecutorService@2c68b710
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|