hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KaiXu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout
Date Wed, 08 Feb 2017 09:47:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857738#comment-15857738
] 

KaiXu commented on HIVE-15671:
------------------------------

I may encounter this situation you mentioned. I run a query, Hive on Spark, failed with error:
2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041        Stage-3_0: 796(+456)/1520       Stage-4_0:
0/2021       Stage-5_0: 0/1009       Stage-6_0: 0/1
2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041        Stage-3_0: 914(+398)/1520       Stage-4_0:
0/2021       Stage-5_0: 0/1009       Stage-6_0: 0/1
2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished   Stage-3_0: 961(+383)/1520       Stage-4_0:
0/2021       Stage-5_0: 0/1009       Stage-6_0: 0/1
Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

the driver was indeed failed with some unknown reason:

17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in stage 3.0 (TID 2519)
17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777)
driver disconnected.
17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated!
Shutting down.
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in stage 3.0 (TID 2511)
17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote
daemon.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut
down; proceeding with flushing remote transports.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in stage 3.0 (TID 2515)
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 3.0 (TID 2417)
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in stage 3.0 (TID 2526)
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 3.0 (TID 2423)

in hive's log, 

2017-02-08T09:51:04,327  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO
scheduler.TaskSetManager: Finished task 971.0 in stage 3.0 (TID 2218) in 5948 ms on hsx-node8
(1338/1520)
2017-02-08T09:51:04,346  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO
rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext,
org.apache.hive.spark.client.rpc.Rpc$MessageHeader)).
2017-02-08T09:51:04,346  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN
rpc.RpcDispatcher: [DriverProtocol] Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage
instead.
2017-02-08T09:51:04,347  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO
rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (null).
2017-02-08T09:51:04,347  INFO [RPC-Handler-3] rpc.RpcDispatcher: [ClientProtocol] Closing
channel due to exception in pipeline (Connection reset by peer).
2017-02-08T09:51:04,347  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR
scheduler.LiveListenerBus: Listener ClientListener threw an exception
2017-02-08T09:51:04,347  INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException:
RPC channel is closed.
2017-02-08T09:51:04,347  INFO [stderr-redir-1] client.SparkClientImpl:  at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN
rpc.Rpc: Failed to send RPC, closing connection.
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl: java.nio.channels.ClosedChannelException
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN
client.RemoteDriver: Shutting down driver because RPC channel was closed.
2017-02-08T09:51:04,348  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO
client.RemoteDriver: Shutting down remote driver.

2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR
scheduler.LiveListenerBus: Listener ClientListener threw an exception
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException:
RPC channel is closed.
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl:  at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
2017-02-08T09:51:04,349  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO
scheduler.DAGScheduler: Asked to cancel job 2
2017-02-08T09:51:04,350  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR
scheduler.LiveListenerBus: Listener ClientListener threw an exception

2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl: java.lang.InterruptedException
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at java.lang.Object.wait(Native
Method)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at java.lang.Object.wait(Object.java:502)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at scala.concurrent.Await$.ready(package.scala:86)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at java.lang.Thread.run(Thread.java:745)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR
scheduler.LiveListenerBus: Listener ClientListener threw an exception
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException:
RPC channel is closed.
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
2017-02-08T09:51:04,351  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
2017-02-08T09:51:04,352  INFO [stderr-redir-1] client.SparkClientImpl:  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)

2017-02-08T09:51:04,654  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO
util.ShutdownHookManager: Deleting directory /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c
2017-02-08T09:52:04,346  WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] impl.RemoteSparkJobStatus:
Error getting stage info
java.util.concurrent.TimeoutException
        at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161)
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96)
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82)
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60)
~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_60]
        at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] status.SparkJobMonitor:
Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
java.lang.IllegalStateException: RPC channel is closed.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:149) ~[guava-14.0.1.jar:?]
        at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]

also in container's log, I find Driver still request for executors:

17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 77 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 76 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 75 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 74 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 73 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 71 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 70 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 50 executor(s).
17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected!
Shutting down. 192.168.1.1:42777
17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected!
Shutting down. hsx-node1:42777
17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1486453422616_0150
17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called


So, is this situation the client to server timeout? I only set hive.spark.job.monitor.timeout=3600s;

> RPCServer.registerClient() erroneously uses server/client handshake timeout for connection
timeout
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15671
>                 URL: https://issues.apache.org/jira/browse/HIVE-15671
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>    * Tells the RPC server to expect a connection from a new client.
>    * ...
>    */
>   public Future<Rpc> registerClient(final String clientId, String secret,
>       RpcDispatcher serverDispatcher) {
>     return registerClient(clientId, secret, serverDispatcher, config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for *hive.spark.client.server.connect.timeout*,
which is meant for timeout for handshake between Hive client and remote Spark driver. Instead,
the timeout should be *hive.spark.client.connect.timeout*, which is for timeout for remote
Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message