spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <squ...@gatech.edu>
Subject Re: Spark standalone network configuration problems
Date Fri, 27 Jun 2014 19:42:43 GMT
Apologies; can you advise as to how I would check that? I can certainly 
SSH from master to machine2.

On 6/27/14, 3:22 PM, Sujeet Varakhedi wrote:
> Looks like your driver is not able to connect to the remote executor 
> on machine2/130.49.226.148:60949 <http://130.49.226.148:60949/>.  Cn 
> you check if the master machine can route to 130.49.226.148
>
> Sujeet
>
>
> On Fri, Jun 27, 2014 at 12:04 PM, Shannon Quinn <squinn@gatech.edu 
> <mailto:squinn@gatech.edu>> wrote:
>
>     For some reason, commenting out spark.driver.host and
>     spark.driver.port fixed something...and broke something else (or
>     at least revealed another problem). For reference, the only lines
>     I have in my spark-defaults.conf now:
>
>     spark.app.name <http://spark.app.name>          myProg
>     spark.master            spark://192.168.1.101:5060
>     <http://192.168.1.101:5060>
>     spark.executor.memory   8g
>     spark.files.overwrite   true
>
>     It starts up, but has problems with machine2. For some reason,
>     machine2 is having trouble communicating with *itself*. Here are
>     the worker logs of one of the failures (there are 10 before it
>     quits):
>
>
>     Spark assembly has been built with Hive, including Datanucleus
>     jars on classpath
>     14/06/27 14:55:13 INFO ExecutorRunner: Launch command: "java"
>     "-cp"
>     "::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar"
>     "-XX:MaxPermSize=128m" "-Xms8192M" "-Xmx8192M"
>     "org.apache.spark.executor.CoarseGrainedExecutorBackend"
>     "akka.tcp://spark@machine1:46378/user/CoarseGrainedScheduler" "7"
>     "machine2" "8" "akka.tcp://sparkWorker@machine2:48019/user/Worker"
>     "app-20140627144512-0001"
>     14/06/27 14:56:54 INFO Worker: Executor app-20140627144512-0001/7
>     finished with state FAILED message Command exited with code 1
>     exitStatus 1
>     14/06/27 14:56:54 INFO LocalActorRef: Message
>     [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
>     from Actor[akka://sparkWorker/deadLetters] to
>     Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40130.49.226.148%3A53561-38#-1924573003]
>     was not delivered. [10] dead letters encountered. This logging can
>     be turned off or adjusted with configuration settings
>     'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
>     14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@machine2:48019] ->
>     [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
>     failed with [akka.tcp://sparkExecutor@machine2:60949]] [
>     akka.remote.EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@machine2:60949]
>     Caused by:
>     akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>     Connection refused: machine2/130.49.226.148:60949
>     <http://130.49.226.148:60949>
>     ]
>     14/06/27 14:56:54 INFO Worker: Asked to launch executor
>     app-20140627144512-0001/8 for Funtown, USA
>     14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@machine2:48019] ->
>     [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
>     failed with [akka.tcp://sparkExecutor@machine2:60949]] [
>     akka.remote.EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@machine2:60949]
>     Caused by:
>     akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>     Connection refused: machine2/130.49.226.148:60949
>     <http://130.49.226.148:60949>
>     ]
>     14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@machine2:48019] ->
>     [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
>     failed with [akka.tcp://sparkExecutor@machine2:60949]] [
>     akka.remote.EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@machine2:60949]
>     Caused by:
>     akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>     Connection refused: machine2/130.49.226.148:60949
>     <http://130.49.226.148:60949>
>     ]
>
>     Port 48019 on machine2 is indeed open, connected, and listening.
>     Any ideas?
>
>     Thanks!
>
>     Shannon
>
>     On 6/27/14, 1:54 AM, sujeetv wrote:
>
>         Try to explicitly set set the "spark.driver.host" property to
>         the master's
>         IP.
>         Sujeet
>
>
>
>         --
>         View this message in context:
>         http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-network-configuration-problems-tp8304p8396.html
>         Sent from the Apache Spark User List mailing list archive at
>         Nabble.com.
>
>
>


Mime
View raw message