spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Spark Web UI is not showing Running / Completed / Active Applications
Date Tue, 11 Nov 2014 08:32:03 GMT
It says

Could not connect to akka.tcp://[sparkMaster@*192.**168.1.222:*7077](
http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException:
Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]

Which means your master is down for some reason. Make sure you are using
the same version of spark in your application.  Also make sure your spark
url is provided as the one you are seeing in the below image

[image: Inline image 1]



Thanks
Best Regards

On Tue, Nov 11, 2014 at 1:35 PM, Samarth Mailinglist <
mailinglistsamarth@gmail.com> wrote:

> This does not work, for some reason:
>
> ...
> 14/11/11 13:30:54 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:30:54 INFO storage.MemoryStore: ensureFreeSpace(175305) called with curMem=0,
maxMem=277842493
> 14/11/11 13:30:54 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory
(estimated size 171.2 KB, free 264.8 MB)
> 14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(12937) called with curMem=175305,
maxMem=277842493
> 14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes
in memory (estimated size 12.6 KB, free 264.8 MB)
> 14/11/11 13:30:55 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on
terajoin.local:39540 (size: 12.6 KB, free: 265.0 MB)
> 14/11/11 13:30:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
> 14/11/11 13:30:55 INFO mapred.FileInputFormat: Total input paths to process : 1
> 14/11/11 13:30:55 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:296
> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Got job 0 (runJob at PythonRDD.scala:296)
with 1 output partitions (allowLocal=true)
> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(runJob at PythonRDD.scala:296)
> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Missing parents: List()
> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Submitting Stage 0 (PythonRDD[3] at RDD
at PythonRDD.scala:43), which has no missing parents
> 14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(5800) called with curMem=188242,
maxMem=277842493
> 14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory
(estimated size 5.7 KB, free 264.8 MB)
> 14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(3773) called with curMem=194042,
maxMem=277842493
> 14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes
in memory (estimated size 3.7 KB, free 264.8 MB)
> 14/11/11 13:30:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on
terajoin.local:39540 (size: 3.7 KB, free: 265.0 MB)
> 14/11/11 13:30:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage
0 (PythonRDD[3] at RDD at PythonRDD.scala:43)
> 14/11/11 13:30:55 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
> 14/11/11 13:31:10 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and have sufficient
memory
> 14/11/11 13:31:14 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.1.222:7077...
> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:25 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and have sufficient
memory
> 14/11/11 13:31:34 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.1.222:7077...
> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
> 14/11/11 13:31:40 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and have sufficient
memory
> 14/11/11 13:31:54 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed.
Reason: All masters are unresponsive! Giving up.
> 14/11/11 13:31:54 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
> 14/11/11 13:31:54 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
> 14/11/11 13:31:54 INFO scheduler.DAGScheduler: Failed to run runJob at PythonRDD.scala:296
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
> Traceback (most recent call last):
>   File "/xxx", line 36, in <module>
>     model = LogisticRegressionWithSGD.train(trainData)
>   File "/usr/local/spark/python/pyspark/mllib/classification.py", line 110, in train
>     initialWeights)
>   File "/usr/local/spark/python/pyspark/mllib/_common.py", line 430, in _regression_train_wrapper
>     initial_weights = _get_initial_weights(initial_weights, data)
>   File "/usr/local/spark/python/pyspark/mllib/_common.py", line 415, in _get_initial_weights
>     initial_weights = _convert_vector(data.first().features)
>   File "/usr/local/spark/python/pyspark/rdd.py", line 1167, in first
>     return self.take(1)[0]
>   File "/usr/local/spark/python/pyspark/rdd.py", line 1153, in take
>     res = self.context.runJob(self, takeUpToNumLeft, p, True)
>   File "/usr/local/spark/python/pyspark/context.py", line 770, in runJob
>     it = self._jvm.PythonRDD.runJob(self._[jsc.sc](http://jsc.sc)(), mappedRDD._jrdd,
javaPartitions, allowLocal)
>   File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
538, in __call__
>   File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300,
in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: All masters are
unresponsive! Giving up.
>     at [org.apache.spark.scheduler.DAGScheduler.org](http://org.apache.spark.scheduler.DAGScheduler.org)$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>     at scala.Option.foreach(Option.scala:236)
>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> 14/11/11 13:31:54 INFO ui.SparkUI: Stopped Spark web UI at [http://xxxx:4040](http://xxxx:4040)
> 14/11/11 13:31:54 INFO scheduler.DAGScheduler: Stopping DAGScheduler
> 14/11/11 13:31:54 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
>
> It only works when I use local.
>
> On Mon, Nov 10, 2014 at 5:09 PM, Akhil Das <akhil@sigmoidanalytics.com>
> wrote:
>
> Change this to
>>
>> spark-submit --master local[8] ~/main/py/file --py-files
>> ~/some/other/files
>>
>> this
>>
>> spark-submit --master spark://blurred-part:7077 ~/main/py/file --py-files
>> ~/some/other/files
>>
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Nov 10, 2014 at 4:55 PM, Akhil Das <akhil@sigmoidanalytics.com>
>> wrote:
>>
>>> You could be running your application in *local* mode. In the
>>> application specify the master as spark://blurred-part:7077 and then it
>>> will appear in the running list.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Nov 10, 2014 at 4:25 PM, Samarth Mailinglist <
>>> mailinglistsamarth@gmail.com> wrote:
>>>
>>>> There are no applications being shown in the dashboard (I am attaching
>>>> a screenshot):
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> This is my spark-env.sh:
>>>>
>>>> SPARK_MASTER_WEBUI_PORT=8888
>>>>
>>>> SPARK_WORKER_INSTANCES=8 #to set the number of worker processes per node
>>>>
>>>> SPARK_HISTORY_OPTS=" -Dspark.history.fs.logDirectory=/usr/local/spark/history-logs/"
#, to set config properties only for the history server (e.g. "-Dx=y")
>>>>
>>>> I have started the history server too..
>>>> ​
>>>>
>>>
>>>
>>  ​
>

Mime
View raw message