spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Spark job fails because of timeout to Driver
Date Fri, 04 Oct 2019 14:37:27 GMT
You can try to increase property spark.yarn.am.waitTime (by default it is
100s)
Maybe you are doing some very time consuming operation when initializing
SparkContext, which cause timeout.

See this property here
http://spark.apache.org/docs/latest/running-on-yarn.html


Jochen Hebbrecht <jochenhebbrecht@gmail.com> 于2019年10月4日周五 下午10:08写道:

> Hi,
>
> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
> towards the cluster. Thhe job gets accepted, but the YARN application fails
> with:
>
>
> {code}
> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [100000
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
> exitCode: 13, (reason: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [100000
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> {code}
>
> It actually goes wrong at this line:
> https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468
>
> Now, I'm 100% sure Spark is OK and there's no bug, but there must be
> something wrong with my setup. I don't understand the code of the
> ApplicationMaster, so could somebody explain me what it is trying to reach?
> Where exactly does the connection timeout? So at least I can debug it
> further because I don't have a clue what it is doing :-)
>
> Thanks for any help!
> Jochen
>


-- 
Best Regards

Jeff Zhang

Mime
View raw message