hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake
Date Wed, 08 Mar 2017 14:28:38 GMT


Xuefu Zhang commented on HIVE-16071:

Hi [~lirui], thank you very much for your further investigation. Based on what you described
and my understanding of the code, I have the following thoughts to share:

1. If a network problem happens before client sends its id, I don't think we can fail the
future, as you said we don't know which one to fail. This is fine and understandable. However,
in this case, we still want to close the channel (which is what cancelTask does).
1. If SaslServerHandle detects any problem, I'm hoping that SaslServerHandle.onError() is
called.  onError() seems doing the right thing (if client is known at that point), except
missing of cancelling the rpc channel.
    protected void onError(Throwable error) {
      if (client != null) {
        if (!client.promise.isDone()) {

Thus, I'm thinking of two work items:
1. Fix the cancelTask timeout value.
2. Fix about #2 by closing the server channel.
These are to make sure that the channel is closed in both cases, though I'm not sure how significant
it is.

What do you think?

> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>                 Key: HIVE-16071
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16071.patch
> Based on its property description in HiveConf and the comments in HIVE-12650 (,
hive.spark.client.connect.timeout is the timeout when the spark remote driver makes a socket
connection (channel) to RPC server. But currently it is also used by the remote driver for
RPC client/server handshaking, which is not right. Instead, hive.spark.client.server.connect.timeout
should be used and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default hive.spark.client.connect.timeout
value (1000ms) used by remote driver for handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: Client closed
before SASL negotiation finished.
>         at io.netty.util.concurrent.AbstractFuture.get(
>         at org.apache.hive.spark.client.RemoteDriver.<init>(
>         at org.apache.hive.spark.client.RemoteDriver.main(
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>         at java.lang.reflect.Method.invoke(
>         at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$
> Caused by: Client closed before SASL negotiation finished.
>         at org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(
>         at org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(
> {code}

This message was sent by Atlassian JIRA

View raw message