It says 

Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]

Which means your master is down for some reason. Make sure you are using the same version of spark in your application.  Also make sure your spark url is provided as the one you are seeing in the below image

Inline image 1



Thanks
Best Regards

On Tue, Nov 11, 2014 at 1:35 PM, Samarth Mailinglist <mailinglistsamarth@gmail.com> wrote:

This does not work, for some reason:

...
14/11/11 13:30:54 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:30:54 INFO storage.MemoryStore: ensureFreeSpace(175305) called with curMem=0, maxMem=277842493
14/11/11 13:30:54 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 171.2 KB, free 264.8 MB)
14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(12937) called with curMem=175305, maxMem=277842493
14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 264.8 MB)
14/11/11 13:30:55 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on terajoin.local:39540 (size: 12.6 KB, free: 265.0 MB)
14/11/11 13:30:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
14/11/11 13:30:55 INFO mapred.FileInputFormat: Total input paths to process : 1
14/11/11 13:30:55 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:296
14/11/11 13:30:55 INFO scheduler.DAGScheduler: Got job 0 (runJob at PythonRDD.scala:296) with 1 output partitions (allowLocal=true)
14/11/11 13:30:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(runJob at PythonRDD.scala:296)
14/11/11 13:30:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/11/11 13:30:55 INFO scheduler.DAGScheduler: Missing parents: List()
14/11/11 13:30:55 INFO scheduler.DAGScheduler: Submitting Stage 0 (PythonRDD[3] at RDD at PythonRDD.scala:43), which has no missing parents
14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(5800) called with curMem=188242, maxMem=277842493
14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.7 KB, free 264.8 MB)
14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(3773) called with curMem=194042, maxMem=277842493
14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.7 KB, free 264.8 MB)
14/11/11 13:30:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on terajoin.local:39540 (size: 3.7 KB, free: 265.0 MB)
14/11/11 13:30:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
14/11/11 13:30:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (PythonRDD[3] at RDD at PythonRDD.scala:43)
14/11/11 13:30:55 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/11/11 13:31:10 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/11 13:31:14 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.1.222:7077...
14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:25 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/11 13:31:34 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.1.222:7077...
14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
14/11/11 13:31:40 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/11 13:31:54 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
14/11/11 13:31:54 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/11/11 13:31:54 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
14/11/11 13:31:54 INFO scheduler.DAGScheduler: Failed to run runJob at PythonRDD.scala:296
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
Traceback (most recent call last):
  File "/xxx", line 36, in <module>
    model = LogisticRegressionWithSGD.train(trainData)
  File "/usr/local/spark/python/pyspark/mllib/classification.py", line 110, in train
    initialWeights)
  File "/usr/local/spark/python/pyspark/mllib/_common.py", line 430, in _regression_train_wrapper
    initial_weights = _get_initial_weights(initial_weights, data)
  File "/usr/local/spark/python/pyspark/mllib/_common.py", line 415, in _get_initial_weights
    initial_weights = _convert_vector(data.first().features)
  File "/usr/local/spark/python/pyspark/rdd.py", line 1167, in first
    return self.take(1)[0]
  File "/usr/local/spark/python/pyspark/rdd.py", line 1153, in take
    res = self.context.runJob(self, takeUpToNumLeft, p, True)
  File "/usr/local/spark/python/pyspark/context.py", line 770, in runJob
    it = self._jvm.PythonRDD.runJob(self._[jsc.sc](http://jsc.sc)(), mappedRDD._jrdd, javaPartitions, allowLocal)
  File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up.
    at [org.apache.spark.scheduler.DAGScheduler.org](http://org.apache.spark.scheduler.DAGScheduler.org)$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

14/11/11 13:31:54 INFO ui.SparkUI: Stopped Spark web UI at [http://xxxx:4040](http://xxxx:4040)
14/11/11 13:31:54 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/11/11 13:31:54 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors

It only works when I use local.

On Mon, Nov 10, 2014 at 5:09 PM, Akhil Das <akhil@sigmoidanalytics.com> wrote:

Change this to 

spark-submit --master local[8] ~/main/py/file --py-files ~/some/other/files

this

spark-submit --master spark://blurred-part:7077 ~/main/py/file --py-files ~/some/other/files


Thanks
Best Regards

On Mon, Nov 10, 2014 at 4:55 PM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
You could be running your application in local mode. In the application specify the master as spark://blurred-part:7077 and then it will appear in the running list.

Thanks
Best Regards

On Mon, Nov 10, 2014 at 4:25 PM, Samarth Mailinglist <mailinglistsamarth@gmail.com> wrote:

There are no applications being shown in the dashboard (I am attaching a screenshot):

Inline image 1

This is my spark-env.sh:

SPARK_MASTER_WEBUI_PORT=8888

SPARK_WORKER_INSTANCES=8 #to set the number of worker processes per node

SPARK_HISTORY_OPTS=" -Dspark.history.fs.logDirectory=/usr/local/spark/history-logs/" #, to set config properties only for the history server (e.g. "-Dx=y")

I have started the history server too..