spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Irina Fedulova <fedul...@gmail.com>
Subject Re: Akka "connection refused" when running standalone Scala app on Spark 0.9.2
Date Sat, 04 Oct 2014 07:24:23 GMT
I've finally resolved my issue! It turned out that it was not related to 
driver-master-worker connectivity settings.

The problem was caused my mlib jar version mismatch:
I noticed that I was using build.sbt from AMPCamp example which 
referenced mllib v0.9.0, but I was running it on Spark 0.9.2.
SBT downloaded mllib jar 0.9.0 during packaging, but I did not pay 
attention to it.

However, looks like mllib 0.9.0 jar does not work correctly on Spark 
0.9.2. When I changed mllib version in dependency list, application 
works perfectly.

Thanks anyway for your time and willingness to help!

Irina

On 04.10.14 00:17, Yana Kadiyska wrote:
> I don't think it's a red herring... (btw. spark.driver.host needs to be
> set to the IP or  FQDN of the machine where you're running the program).
>
> I am running 0.9.2 on CDH4 and the beginning of my executor log looks
> like below (I've obfuscated the IP -- this is the log from executor
> a100-2-200-245). My driver is running on a100-2-200-238. I am not
> specifically setting spark.driver.host or the port but depending on how
> your machine is setup you might need to:
>
> |SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 14/10/03 18:14:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 14/10/03 18:14:48 INFO Remoting: Starting remoting
> 14/10/03 18:14:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@a100-2-200-245:56760]
> 14/10/03 18:14:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@a100-2-200-245:56760]
> **14/10/03 18:14:48 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver:
akka.tcp://spark@a100-2-200-238:61505/user/CoarseGrainedScheduler**
> 14/10/03 18:14:48 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@a100-2-200-245:48067/user/Worker
> 14/10/03 18:14:48 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@a100-2-200-245:48067/user/Worker
> **14/10/03 18:14:49 INFO executor.CoarseGrainedExecutorBackend: Successfully registered
with driver**
> 14/10/03 18:14:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 14/10/03 18:14:49 INFO Remoting: Starting remoting
> |
>
> ‚Äč
> If you look at the lines with ** this is where the driver successfully
> connects and at this point you should see your app show up in the UI
> under "Running applications"...The worker log you're posting -- is that
> the log that stored under work/app-<id>/<executor-id>/stderr? The first
> line you show in that log is
>
>   INFO worker.Worker: Executor
>      app-20141002131901-0002/9 finished with state FAILED
>
> but I imagine something prior to that would say why the executor failed?
>
> On Fri, Oct 3, 2014 at 2:56 PM, Irina Fedulova <fedulova@gmail.com
> <mailto:fedulova@gmail.com>> wrote:
>
>     Yana, many thanks for looking into this!
>
>     I am not running spark-shell in local mode, I am really starting
>     spark-shell with --master spark://master:7077 and run in cluster mode.
>
>     Second thing is I tried to set "spark.driver.host" to "master" both
>     in scala app when creating context, and in conf/spark-defaults.conf
>     file, but this did not make any difference. Worker logs still have
>     same messages:
>     14/10/03 13:37:30 ERROR remote.EndpointWriter: AssociationError
>     [akka.tcp://sparkWorker@host2:__51414] ->
>     [akka.tcp://sparkExecutor@__host2:53851]: Error [Association failed
>     with [akka.tcp://sparkExecutor@__host2:53851]] [
>     akka.remote.__EndpointAssociationException: Association failed with
>     [akka.tcp://sparkExecutor@__host2:53851]
>     Caused by:
>     akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
>     Connection refused: host2/xxx.xx.xx.xx:53851
>     ]
>
>     note that host1, host2 etc are slave hostnames, and each slave has
>     error message about itself: host1:<some random port> cannot connect
>     to host1:<some random port>.
>
>     However I noticed that after running successfully SparkPi app log
>     also is populated with similar "connection refused" messages, but
>     this does not lead to application death... So these worker logs are
>     probably a false clue.
>
>
>
>     On 03.10.14 19:37, Yana Kadiyska wrote:
>
>         when you're running spark-shell and the example, are you actually
>         specifying --master spark://master:7077 as shown here:
>         http://spark.apache.org/docs/__latest/programming-guide.html#__initializing-spark
>         <http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark>
>
>         because if you're not, your spark-shell is running in local mode
>         and not
>         actually connecting to the cluster. Also, if you run spark-shell
>         against
>         the cluster, you'll see it listed under the Running applications
>         in the
>         master UI. It would be pretty odd for spark shell to connect
>         successfully to the cluster but for your app to not
>         connect...(which is
>         why I suspect that you're running spark-shell local)
>
>         Another thing to check, the executors need to connect back to your
>         driver, so it could be that you have to set the driver host or
>         driver
>         port...in fact looking at your executor log, this seems fairly
>         likely:
>         is host1/xxx.xx.xx.xx:45542 the machine where your driver is
>         running? is
>         that host/port reachable from the worker machines?
>
>         On Fri, Oct 3, 2014 at 5:32 AM, Irina Fedulova
>         <fedulova@gmail.com <mailto:fedulova@gmail.com>
>         <mailto:fedulova@gmail.com <mailto:fedulova@gmail.com>>> wrote:
>
>              Hi,
>
>              I have set up Spark 0.9.2 standalone cluster using CDH5 and
>              pre-built spark distribution archive for Hadoop 2. I was
>         not using
>              spark-ec2 scripts because I am not on EC2 cloud.
>
>              Spark-shell seems to be working properly -- I am able to
>         perform
>              simple RDD operations, as well as e.g. SparkPi standalone
>         example
>              works well when run via `run-example`. Web UI shows all workers
>              connected.
>
>              However, standalone Scala application gets "connection refused"
>              messages. I think this has something to do with configuration,
>              because spark-shell and SparkPi works well. I verified that
>              .setMaster and .setSparkHome are properly assigned within
>         scala app.
>
>              Is there anything else in configuration of standalone scala
>         app on
>              spark that I am missing?
>              I would very much appreciate any clues.
>
>              Namely, I am trying to run MovieLensALS.scala example from
>         AMPCamp
>              big data mini course
>
>         (http://ampcamp.berkeley.edu/____big-data-mini-course/movie-____recommendation-with-mllib.html
>         <http://ampcamp.berkeley.edu/__big-data-mini-course/movie-__recommendation-with-mllib.html>
>
>         <http://ampcamp.berkeley.edu/__big-data-mini-course/movie-__recommendation-with-mllib.html
>         <http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html>__>__).
>
>              Here is error which I get when try to run compiled jar:
>              ---------------
>              root@master:~/machine-____learning/scala# sbt/sbt package "run
>              /movielens/medium"
>              Launching sbt from sbt/sbt-launch-0.12.4.jar
>              [info] Loading project definition from
>              /root/training/machine-____learning/scala/project
>              [info] Set current project to movielens-als (in build
>              file:/root/training/machine-____learning/scala/)
>              [info] Compiling 1 Scala source to
>
>         /root/training/machine-____learning/scala/target/scala-2.____10/classes...
>              [warn] there were 2 deprecation warning(s); re-run with
>         -deprecation
>              for details
>              [warn] one warning found
>              [info] Packaging
>
>         /root/training/machine-____learning/scala/target/scala-2.____10/movielens-als_2.10-0.0.__jar
>              ...
>              [info] Done packaging.
>              [success] Total time: 6 s, completed Oct 2, 2014 1:19:00 PM
>              [info] Running MovieLensALS /movielens/medium
>              master = spark://master:7077
>              log4j:WARN No appenders could be found for logger
>              (akka.event.slf4j.Slf4jLogger)____.
>              log4j:WARN Please initialize the log4j system properly.
>              log4j:WARN See
>         http://logging.apache.org/____log4j/1.2/faq.html#noconfig
>         <http://logging.apache.org/__log4j/1.2/faq.html#noconfig>
>
>              <http://logging.apache.org/__log4j/1.2/faq.html#noconfig
>         <http://logging.apache.org/log4j/1.2/faq.html#noconfig>> for
>         more info.
>              14/10/02 13:19:01 WARN NativeCodeLoader: Unable to load
>              native-hadoop library for your platform... using builtin-java
>              classes where applicable
>              HERE
>              THERE
>              14/10/02 13:19:02 INFO FileInputFormat: Total input paths
>         to process : 1
>              14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 0
>         on host2:
>              remote Akka client disassociated
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
>              14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 4
>         on host5:
>              remote Akka client disassociated
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
>              14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 1
>         on host4:
>              remote Akka client disassociated
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
>              14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 3
>         on host3:
>              remote Akka client disassociated
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
>              14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 2
>         on host1:
>              remote Akka client disassociated
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
>              14/10/02 13:19:03 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
>              14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 6
>         on host4:
>              remote Akka client disassociated
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 8 (task 0.0:0)
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
>              14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 5
>         on host2:
>              remote Akka client disassociated
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 10 (task 0.0:1)
>              14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 7
>         on host5:
>              remote Akka client disassociated
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 11 (task 0.0:0)
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 12 (task 0.0:1)
>              14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 8
>         on host3:
>              remote Akka client disassociated
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 13 (task 0.0:1)
>              14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 9
>         on host1:
>              remote Akka client disassociated
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 14 (task 0.0:0)
>              14/10/02 13:19:04 WARN TaskSetManager: Lost TID 15 (task 0.0:1)
>              14/10/02 13:19:05 ERROR AppClient$ClientActor: Master
>         removed our
>              application: FAILED; stopping client
>              14/10/02 13:19:05 WARN SparkDeploySchedulerBackend:
>         Disconnected
>              from Spark cluster! Waiting for reconnection...
>              14/10/02 13:19:06 ERROR TaskSchedulerImpl: Lost executor 11 on
>              host5: remote Akka client disassociated
>              14/10/02 13:19:06 WARN TaskSetManager: Lost TID 17 (task 0.0:0)
>              14/10/02 13:19:06 WARN TaskSetManager: Lost TID 16 (task 0.0:1)
>              ---------------
>
>              And this is error log on one of the workers:
>              ---------------
>              14/10/02 13:19:05 INFO worker.Worker: Executor
>              app-20141002131901-0002/9 finished with state FAILED
>         message Command
>              exited with code 1 exitStatus 1
>              14/10/02 13:19:05 INFO actor.LocalActorRef: Message
>
>         [akka.remote.transport.____ActorTransportAdapter$____DisassociateUnderlying]
>              from Actor[akka://sparkWorker/____deadLetters] to
>
>         Actor[akka://sparkWorker/____system/transports/____akkaprotocolmanager.tcp0/____akkaProtocol-tcp%3A%2F%____2FsparkWorker%40xxx.xx.xx.xx%____3A57719-15#1504298502]
>              was not delivered. [6] dead letters encountered. This
>         logging can be
>              turned off or adjusted with configuration settings
>              'akka.log-dead-letters' and
>         'akka.log-dead-letters-during-____shutdown'.
>              14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>              [akka.tcp://sparkWorker@host1:____47421] ->
>              [akka.tcp://sparkExecutor@____host1:45542]: Error
>         [Association failed
>              with [akka.tcp://sparkExecutor@____host1:45542]] [
>              akka.remote.____EndpointAssociationException: Association
>         failed with
>              [akka.tcp://sparkExecutor@____host1:45542]
>              Caused by:
>
>         akka.remote.transport.netty.____NettyTransport$$anonfun$____associate$1$$anon$2:
>              Connection refused: host1/xxx.xx.xx.xx:45542
>              ]
>              14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>              [akka.tcp://sparkWorker@host1:____47421] ->
>              [akka.tcp://sparkExecutor@____host1:45542]: Error
>         [Association failed
>              with [akka.tcp://sparkExecutor@____host1:45542]] [
>              akka.remote.____EndpointAssociationException: Association
>         failed with
>              [akka.tcp://sparkExecutor@____host1:45542]
>              Caused by:
>
>         akka.remote.transport.netty.____NettyTransport$$anonfun$____associate$1$$anon$2:
>              Connection refused: host1/xxx.xx.xx.xx:45542
>              ]
>              14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>              [akka.tcp://sparkWorker@host1:____47421] ->
>              [akka.tcp://sparkExecutor@____host1:45542]: Error
>         [Association failed
>              with [akka.tcp://sparkExecutor@____host1:45542]] [
>              akka.remote.____EndpointAssociationException: Association
>         failed with
>              [akka.tcp://sparkExecutor@____host1:45542]
>              Caused by:
>
>         akka.remote.transport.netty.____NettyTransport$$anonfun$____associate$1$$anon$2:
>              Connection refused: host1/xxx.xx.xx.xx:45542
>              ---------------
>
>              Thanks!
>              Irina
>
>
>         ------------------------------____----------------------------__--__---------
>              To unsubscribe, e-mail: user-unsubscribe@spark.apache.____org
>              <mailto:user-unsubscribe@__spark.apache.org
>         <mailto:user-unsubscribe@spark.apache.org>>
>              For additional commands, e-mail: user-help@spark.apache.org
>         <mailto:user-help@spark.apache.org>
>              <mailto:user-help@spark.__apache.org
>         <mailto:user-help@spark.apache.org>>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message