spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Irina Fedulova <fedul...@gmail.com>
Subject Re: Akka "connection refused" when running standalone Scala app on Spark 0.9.2
Date Fri, 03 Oct 2014 18:56:53 GMT
Yana, many thanks for looking into this!

I am not running spark-shell in local mode, I am really starting 
spark-shell with --master spark://master:7077 and run in cluster mode.

Second thing is I tried to set "spark.driver.host" to "master" both in 
scala app when creating context, and in conf/spark-defaults.conf file, 
but this did not make any difference. Worker logs still have same messages:
14/10/03 13:37:30 ERROR remote.EndpointWriter: AssociationError 
[akka.tcp://sparkWorker@host2:51414] -> 
[akka.tcp://sparkExecutor@host2:53851]: Error [Association failed with 
[akka.tcp://sparkExecutor@host2:53851]] [
akka.remote.EndpointAssociationException: Association failed with 
[akka.tcp://sparkExecutor@host2:53851]
Caused by: 
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: 
Connection refused: host2/xxx.xx.xx.xx:53851
]

note that host1, host2 etc are slave hostnames, and each slave has error 
message about itself: host1:<some random port> cannot connect to 
host1:<some random port>.

However I noticed that after running successfully SparkPi app log also 
is populated with similar "connection refused" messages, but this does 
not lead to application death... So these worker logs are probably a 
false clue.



On 03.10.14 19:37, Yana Kadiyska wrote:
> when you're running spark-shell and the example, are you actually
> specifying --master spark://master:7077 as shown here:
> http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark
>
> because if you're not, your spark-shell is running in local mode and not
> actually connecting to the cluster. Also, if you run spark-shell against
> the cluster, you'll see it listed under the Running applications in the
> master UI. It would be pretty odd for spark shell to connect
> successfully to the cluster but for your app to not connect...(which is
> why I suspect that you're running spark-shell local)
>
> Another thing to check, the executors need to connect back to your
> driver, so it could be that you have to set the driver host or driver
> port...in fact looking at your executor log, this seems fairly likely:
> is host1/xxx.xx.xx.xx:45542 the machine where your driver is running? is
> that host/port reachable from the worker machines?
>
> On Fri, Oct 3, 2014 at 5:32 AM, Irina Fedulova <fedulova@gmail.com
> <mailto:fedulova@gmail.com>> wrote:
>
>     Hi,
>
>     I have set up Spark 0.9.2 standalone cluster using CDH5 and
>     pre-built spark distribution archive for Hadoop 2. I was not using
>     spark-ec2 scripts because I am not on EC2 cloud.
>
>     Spark-shell seems to be working properly -- I am able to perform
>     simple RDD operations, as well as e.g. SparkPi standalone example
>     works well when run via `run-example`. Web UI shows all workers
>     connected.
>
>     However, standalone Scala application gets "connection refused"
>     messages. I think this has something to do with configuration,
>     because spark-shell and SparkPi works well. I verified that
>     .setMaster and .setSparkHome are properly assigned within scala app.
>
>     Is there anything else in configuration of standalone scala app on
>     spark that I am missing?
>     I would very much appreciate any clues.
>
>     Namely, I am trying to run MovieLensALS.scala example from AMPCamp
>     big data mini course
>     (http://ampcamp.berkeley.edu/__big-data-mini-course/movie-__recommendation-with-mllib.html
>     <http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html>__).
>
>     Here is error which I get when try to run compiled jar:
>     ---------------
>     root@master:~/machine-__learning/scala# sbt/sbt package "run
>     /movielens/medium"
>     Launching sbt from sbt/sbt-launch-0.12.4.jar
>     [info] Loading project definition from
>     /root/training/machine-__learning/scala/project
>     [info] Set current project to movielens-als (in build
>     file:/root/training/machine-__learning/scala/)
>     [info] Compiling 1 Scala source to
>     /root/training/machine-__learning/scala/target/scala-2.__10/classes...
>     [warn] there were 2 deprecation warning(s); re-run with -deprecation
>     for details
>     [warn] one warning found
>     [info] Packaging
>     /root/training/machine-__learning/scala/target/scala-2.__10/movielens-als_2.10-0.0.jar
>     ...
>     [info] Done packaging.
>     [success] Total time: 6 s, completed Oct 2, 2014 1:19:00 PM
>     [info] Running MovieLensALS /movielens/medium
>     master = spark://master:7077
>     log4j:WARN No appenders could be found for logger
>     (akka.event.slf4j.Slf4jLogger)__.
>     log4j:WARN Please initialize the log4j system properly.
>     log4j:WARN See
>     http://logging.apache.org/__log4j/1.2/faq.html#noconfig
>     <http://logging.apache.org/log4j/1.2/faq.html#noconfig> for more info.
>     14/10/02 13:19:01 WARN NativeCodeLoader: Unable to load
>     native-hadoop library for your platform... using builtin-java
>     classes where applicable
>     HERE
>     THERE
>     14/10/02 13:19:02 INFO FileInputFormat: Total input paths to process : 1
>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 0 on host2:
>     remote Akka client disassociated
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 4 on host5:
>     remote Akka client disassociated
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 1 on host4:
>     remote Akka client disassociated
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 3 on host3:
>     remote Akka client disassociated
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 2 on host1:
>     remote Akka client disassociated
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 6 on host4:
>     remote Akka client disassociated
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 8 (task 0.0:0)
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 5 on host2:
>     remote Akka client disassociated
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 10 (task 0.0:1)
>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 7 on host5:
>     remote Akka client disassociated
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 11 (task 0.0:0)
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 12 (task 0.0:1)
>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 8 on host3:
>     remote Akka client disassociated
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 13 (task 0.0:1)
>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 9 on host1:
>     remote Akka client disassociated
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 14 (task 0.0:0)
>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 15 (task 0.0:1)
>     14/10/02 13:19:05 ERROR AppClient$ClientActor: Master removed our
>     application: FAILED; stopping client
>     14/10/02 13:19:05 WARN SparkDeploySchedulerBackend: Disconnected
>     from Spark cluster! Waiting for reconnection...
>     14/10/02 13:19:06 ERROR TaskSchedulerImpl: Lost executor 11 on
>     host5: remote Akka client disassociated
>     14/10/02 13:19:06 WARN TaskSetManager: Lost TID 17 (task 0.0:0)
>     14/10/02 13:19:06 WARN TaskSetManager: Lost TID 16 (task 0.0:1)
>     ---------------
>
>     And this is error log on one of the workers:
>     ---------------
>     14/10/02 13:19:05 INFO worker.Worker: Executor
>     app-20141002131901-0002/9 finished with state FAILED message Command
>     exited with code 1 exitStatus 1
>     14/10/02 13:19:05 INFO actor.LocalActorRef: Message
>     [akka.remote.transport.__ActorTransportAdapter$__DisassociateUnderlying]
>     from Actor[akka://sparkWorker/__deadLetters] to
>     Actor[akka://sparkWorker/__system/transports/__akkaprotocolmanager.tcp0/__akkaProtocol-tcp%3A%2F%__2FsparkWorker%40xxx.xx.xx.xx%__3A57719-15#1504298502]
>     was not delivered. [6] dead letters encountered. This logging can be
>     turned off or adjusted with configuration settings
>     'akka.log-dead-letters' and 'akka.log-dead-letters-during-__shutdown'.
>     14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@host1:__47421] ->
>     [akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
>     with [akka.tcp://sparkExecutor@__host1:45542]] [
>     akka.remote.__EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@__host1:45542]
>     Caused by:
>     akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
>     Connection refused: host1/xxx.xx.xx.xx:45542
>     ]
>     14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@host1:__47421] ->
>     [akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
>     with [akka.tcp://sparkExecutor@__host1:45542]] [
>     akka.remote.__EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@__host1:45542]
>     Caused by:
>     akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
>     Connection refused: host1/xxx.xx.xx.xx:45542
>     ]
>     14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@host1:__47421] ->
>     [akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
>     with [akka.tcp://sparkExecutor@__host1:45542]] [
>     akka.remote.__EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@__host1:45542]
>     Caused by:
>     akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
>     Connection refused: host1/xxx.xx.xx.xx:45542
>     ---------------
>
>     Thanks!
>     Irina
>
>     ------------------------------__------------------------------__---------
>     To unsubscribe, e-mail: user-unsubscribe@spark.apache.__org
>     <mailto:user-unsubscribe@spark.apache.org>
>     For additional commands, e-mail: user-help@spark.apache.org
>     <mailto:user-help@spark.apache.org>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message