spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Johann <roland.joh...@phenetic.io.INVALID>
Subject Re: Spark job fails because of timeout to Driver
Date Fri, 04 Oct 2019 15:14:41 GMT
Ho Jochen,

did you setup the EMR cluster with custom security groups? Can you confirm
that the relevant EC2 instances can connect through relevant ports?

Best regards

Jochen Hebbrecht <jochenhebbrecht@gmail.com> schrieb am Fr. 4. Okt. 2019 um
17:09:

> Hi Jeff,
>
> Thanks! Just tried that, but the same timeout occurs :-( ...
>
> Jochen
>
> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang <zjffdu@gmail.com>:
>
>> You can try to increase property spark.yarn.am.waitTime (by default it
>> is 100s)
>> Maybe you are doing some very time consuming operation when initializing
>> SparkContext, which cause timeout.
>>
>> See this property here
>> http://spark.apache.org/docs/latest/running-on-yarn.html
>>
>>
>> Jochen Hebbrecht <jochenhebbrecht@gmail.com> 于2019年10月4日周五 下午10:08写道:
>>
>>> Hi,
>>>
>>> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
>>> towards the cluster. Thhe job gets accepted, but the YARN application fails
>>> with:
>>>
>>>
>>> {code}
>>> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
>>> java.util.concurrent.TimeoutException: Futures timed out after [100000
>>> milliseconds]
>>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
>>> exitCode: 13, (reason: Uncaught exception:
>>> java.util.concurrent.TimeoutException: Futures timed out after [100000
>>> milliseconds]
>>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>> {code}
>>>
>>> It actually goes wrong at this line:
>>> https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468
>>>
>>> Now, I'm 100% sure Spark is OK and there's no bug, but there must be
>>> something wrong with my setup. I don't understand the code of the
>>> ApplicationMaster, so could somebody explain me what it is trying to reach?
>>> Where exactly does the connection timeout? So at least I can debug it
>>> further because I don't have a clue what it is doing :-)
>>>
>>> Thanks for any help!
>>> Jochen
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
> --


*Roland Johann*Software Developer/Data Engineer

*phenetic GmbH*
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.johann@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann

Mime
View raw message