spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From igor cabral uchoa <igorucho...@yahoo.com.br.INVALID>
Subject Re: Spark job fails because of timeout to Driver
Date Fri, 04 Oct 2019 15:42:14 GMT
Hi Roland!
What deploy mode are you using when you submit your applications? It is client or cluster
mode?
Regards,


Sent from Yahoo Mail for iPhone


On Friday, October 4, 2019, 12:37 PM, Roland Johann <roland.johann@phenetic.io.INVALID>
wrote:

This are dynamic port ranges and dependa on configuration of your cluster. Per job there is
a separate application master so there can‘t be just one port.If I remeber correctly the
default EMR setup creates worker security groups with unrestricted traffic within the group,
e.g. Between the worker nodes.Depending on your security requirements I suggest that you start
with a  default like setup and determine ports and port ranges from the docs afterwards to
further restrict traffic between the nodes.
Kind regards
Jochen Hebbrecht <jochenhebbrecht@gmail.com> schrieb am Fr. 4. Okt. 2019 um 17:16:

Hi Roland,
We have indeed custom security groups. Can you tell me where exactly I need to be able to
access what?
For example, is it from the master instance to the driver instance? And which port should
be open?

Jochen
Op vr 4 okt. 2019 om 17:14 schreef Roland Johann <roland.johann@phenetic.io>:

Ho Jochen,
did you setup the EMR cluster with custom security groups? Can you confirm that the relevant
EC2 instances can connect through relevant ports?
Best regards
Jochen Hebbrecht <jochenhebbrecht@gmail.com> schrieb am Fr. 4. Okt. 2019 um 17:09:

Hi Jeff,
Thanks! Just tried that, but the same timeout occurs :-( ...

Jochen
Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang <zjffdu@gmail.com>:

You can try to increase property spark.yarn.am.waitTime (by default it is 100s)  Maybe
you are doing some very time consuming operation when initializing SparkContext, which cause
timeout.
See this property here http://spark.apache.org/docs/latest/running-on-yarn.html

Jochen Hebbrecht <jochenhebbrecht@gmail.com> 于2019年10月4日周五 下午10:08写道:


Hi,

I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job towards the cluster.
Thhe job gets accepted, but the YARN application fails with:


{code}
19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
 at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason:
Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000
milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
 at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
{code}

It actually goes wrong at this line: https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468

Now, I'm 100% sure Spark is OK and there's no bug, but there must be something wrong with
my setup. I don't understand the code of the ApplicationMaster, so could somebody explain
me what it is trying to reach? Where exactly does the connection timeout? So at least I can
debug it further because I don't have a clue what it is doing :-)

Thanks for any help!
Jochen




-- 
Best Regards

Jeff Zhang

-- 

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.johann@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann


-- 

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.johann@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann




Mime
View raw message