spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Evans <>
Subject Re: Possible to limit number of IPC retries on spark-submit?
Date Fri, 31 Jan 2020 21:53:24 GMT
Figured out the answer, eventually.  The magic property name, in this case,
is yarn.client.failover-max-attempts (prefixed with spark.hadoop. in the
case of Spark, of course).  For a full explanation, see the StackOverflow
answer <> I just added.

On Wed, Jan 22, 2020 at 5:02 PM Jeff Evans <>

> Greetings,
> Is it possible to limit the number of times the IPC client retries upon a
> spark-submit invocation?  For context, see this StackOverflow post
> <>.
> In essence, I am trying to call spark-submit on a Kerberized cluster,
> without having valid Kerberos tickets available.  This is deliberate, and
> I'm not truly facing a Kerberos issue.  Rather, this is the
> easiest reproducible case of "long IPC retry" I have been able to trigger.
> In this particular case, the following errors are printed (presumably by
> the launcher):
> 20/01/22 15:49:32 INFO retry.RetryInvocationHandler: Failed on local
exception: Client
cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "node-1.cluster/";
destination host is: "node-1.cluster":8032; , while invoking ApplicationClientProtocolPBClientImpl.getClusterMetrics
over null after 1 failover attempts. Trying to failover after sleeping for 35160ms.
> This continues for 30 times before the launcher finally gives up.
> As indicated in the answer on that StackOverflow post, the relevant Hadoop
> properties should be ipc.client.connect.max.retries and/or
> ipc.client.connect.max.retries.on.sasl.  However, in testing on Spark
> 2.4.0 (on CDH 6.1), I am not able to get either of these to take effect (it
> still retries 30 times regardless).  I am trying the SparkPi example, and
> specifying them with --conf spark.hadoop.ipc.client.connect.max.retries
> and/or --conf spark.hadoop.ipc.client.connect.max.retries.on.sasl.
> Any ideas on what I could be doing wrong, or why I can't get these
> properties to take effect?

View raw message