spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <squ...@gatech.edu>
Subject Re: Spark standalone network configuration problems
Date Fri, 27 Jun 2014 21:25:00 GMT
I switched which machine was the master and which was the dedicated 
worker, and now it works just fine. I discovered machine2 is on my 
department's DMZ; machine1 is not. I suspect the departmental firewall 
was causing problems. By moving the master to machine2, that seems to 
have solved my problems.

Thank you all very much for your help. I'm sure I'll have other 
questions soon :)

Regards,
Shannon

On 6/27/14, 3:22 PM, Sujeet Varakhedi wrote:
> Looks like your driver is not able to connect to the remote executor 
> on machine2/130.49.226.148:60949 <http://130.49.226.148:60949/>.  Cn 
> you check if the master machine can route to 130.49.226.148
>
> Sujeet
>
>
> On Fri, Jun 27, 2014 at 12:04 PM, Shannon Quinn <squinn@gatech.edu 
> <mailto:squinn@gatech.edu>> wrote:
>
>     For some reason, commenting out spark.driver.host and
>     spark.driver.port fixed something...and broke something else (or
>     at least revealed another problem). For reference, the only lines
>     I have in my spark-defaults.conf now:
>
>     spark.app.name <http://spark.app.name>          myProg
>     spark.master            spark://192.168.1.101:5060
>     <http://192.168.1.101:5060>
>     spark.executor.memory   8g
>     spark.files.overwrite   true
>
>     It starts up, but has problems with machine2. For some reason,
>     machine2 is having trouble communicating with *itself*. Here are
>     the worker logs of one of the failures (there are 10 before it
>     quits):
>
>
>     Spark assembly has been built with Hive, including Datanucleus
>     jars on classpath
>     14/06/27 14:55:13 INFO ExecutorRunner: Launch command: "java"
>     "-cp"
>     "::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar"
>     "-XX:MaxPermSize=128m" "-Xms8192M" "-Xmx8192M"
>     "org.apache.spark.executor.CoarseGrainedExecutorBackend"
>     "akka.tcp://spark@machine1:46378/user/CoarseGrainedScheduler" "7"
>     "machine2" "8" "akka.tcp://sparkWorker@machine2:48019/user/Worker"
>     "app-20140627144512-0001"
>     14/06/27 14:56:54 INFO Worker: Executor app-20140627144512-0001/7
>     finished with state FAILED message Command exited with code 1
>     exitStatus 1
>     14/06/27 14:56:54 INFO LocalActorRef: Message
>     [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
>     from Actor[akka://sparkWorker/deadLetters] to
>     Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40130.49.226.148%3A53561-38#-1924573003]
>     was not delivered. [10] dead letters encountered. This logging can
>     be turned off or adjusted with configuration settings
>     'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
>     14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@machine2:48019] ->
>     [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
>     failed with [akka.tcp://sparkExecutor@machine2:60949]] [
>     akka.remote.EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@machine2:60949]
>     Caused by:
>     akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>     Connection refused: machine2/130.49.226.148:60949
>     <http://130.49.226.148:60949>
>     ]
>     14/06/27 14:56:54 INFO Worker: Asked to launch executor
>     app-20140627144512-0001/8 for Funtown, USA
>     14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@machine2:48019] ->
>     [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
>     failed with [akka.tcp://sparkExecutor@machine2:60949]] [
>     akka.remote.EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@machine2:60949]
>     Caused by:
>     akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>     Connection refused: machine2/130.49.226.148:60949
>     <http://130.49.226.148:60949>
>     ]
>     14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@machine2:48019] ->
>     [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
>     failed with [akka.tcp://sparkExecutor@machine2:60949]] [
>     akka.remote.EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@machine2:60949]
>     Caused by:
>     akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>     Connection refused: machine2/130.49.226.148:60949
>     <http://130.49.226.148:60949>
>     ]
>
>     Port 48019 on machine2 is indeed open, connected, and listening.
>     Any ideas?
>
>     Thanks!
>
>     Shannon
>
>     On 6/27/14, 1:54 AM, sujeetv wrote:
>
>         Try to explicitly set set the "spark.driver.host" property to
>         the master's
>         IP.
>         Sujeet
>
>
>
>         --
>         View this message in context:
>         http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-network-configuration-problems-tp8304p8396.html
>         Sent from the Apache Spark User List mailing list archive at
>         Nabble.com.
>
>
>


Mime
View raw message