spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-29276) Spark job fails because of timeout to Driver
Date Fri, 04 Oct 2019 08:58:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-29276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyukjin Kwon resolved SPARK-29276.
----------------------------------
    Resolution: Invalid

Let's ask questions into mailing list or stackoverflow. You would be able to get a better
answer.

> Spark job fails because of timeout to Driver
> --------------------------------------------
>
>                 Key: SPARK-29276
>                 URL: https://issues.apache.org/jira/browse/SPARK-29276
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 2.4.2
>            Reporter: Jochen Hebbrecht
>            Priority: Major
>
> Hi,
> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job towards the cluster.
Thhe job gets accepted, but the YARN application fails with:
> {code}
> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: 
> java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
> 	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> 	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> 	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason:
Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000
milliseconds]
> 	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> 	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> 	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> {code}
> It actually goes wrong at this line: https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468
> Now, I'm 100% sure Spark is OK and there's no bug, but there must be something wrong
with my setup. I don't understand the code of the ApplicationMaster, so could somebody explain
me what it is trying to reach? Where exactly does the connection timeout? So at least I can
debug it further because I don't have a clue what it is doing :-)
> Thanks for any help!
> Jochen



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message