spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Hebbrecht <jochenhebbre...@gmail.com>
Subject Spark job fails because of timeout to Driver
Date Fri, 04 Oct 2019 14:07:54 GMT
Hi,

I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
towards the cluster. Thhe job gets accepted, but the YARN application fails
with:


{code}
19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000
milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org
$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
exitCode: 13, (reason: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000
milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org
$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
{code}

It actually goes wrong at this line:
https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468

Now, I'm 100% sure Spark is OK and there's no bug, but there must be
something wrong with my setup. I don't understand the code of the
ApplicationMaster, so could somebody explain me what it is trying to reach?
Where exactly does the connection timeout? So at least I can debug it
further because I don't have a clue what it is doing :-)

Thanks for any help!
Jochen

Mime
View raw message