spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarth Mailinglist <mailinglistsama...@gmail.com>
Subject Re: Spark Web UI is not showing Running / Completed / Active Applications
Date Tue, 11 Nov 2014 09:11:23 GMT
Perfect thanks. I was using the Local IP address, and not the one displayed
in the Web UI. Working fine now!

On Tue, Nov 11, 2014 at 2:02 PM, Akhil Das <akhil@sigmoidanalytics.com>
wrote:

> It says
>
> Could not connect to akka.tcp://[sparkMaster@*192.**168.1.222:*7077](
> http://sparkMaster@192.168.1.222:7077): akka.remote.EndpointAssociationException:
> Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>
> Which means your master is down for some reason. Make sure you are using
> the same version of spark in your application.  Also make sure your spark
> url is provided as the one you are seeing in the below image
>
> [image: Inline image 1]
>
>
>
> Thanks
> Best Regards
>
> On Tue, Nov 11, 2014 at 1:35 PM, Samarth Mailinglist <
> mailinglistsamarth@gmail.com> wrote:
>
>> This does not work, for some reason:
>>
>> ...
>> 14/11/11 13:30:54 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
>> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:30:54 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:30:54 INFO storage.MemoryStore: ensureFreeSpace(175305) called with curMem=0,
maxMem=277842493
>> 14/11/11 13:30:54 INFO storage.MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 171.2 KB, free 264.8 MB)
>> 14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(12937) called with curMem=175305,
maxMem=277842493
>> 14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes
in memory (estimated size 12.6 KB, free 264.8 MB)
>> 14/11/11 13:30:55 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory
on terajoin.local:39540 (size: 12.6 KB, free: 265.0 MB)
>> 14/11/11 13:30:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
>> 14/11/11 13:30:55 INFO mapred.FileInputFormat: Total input paths to process : 1
>> 14/11/11 13:30:55 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:296
>> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Got job 0 (runJob at PythonRDD.scala:296)
with 1 output partitions (allowLocal=true)
>> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(runJob at PythonRDD.scala:296)
>> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
>> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Missing parents: List()
>> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Submitting Stage 0 (PythonRDD[3] at
RDD at PythonRDD.scala:43), which has no missing parents
>> 14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(5800) called with curMem=188242,
maxMem=277842493
>> 14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 5.7 KB, free 264.8 MB)
>> 14/11/11 13:30:55 INFO storage.MemoryStore: ensureFreeSpace(3773) called with curMem=194042,
maxMem=277842493
>> 14/11/11 13:30:55 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes
in memory (estimated size 3.7 KB, free 264.8 MB)
>> 14/11/11 13:30:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory
on terajoin.local:39540 (size: 3.7 KB, free: 265.0 MB)
>> 14/11/11 13:30:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
>> 14/11/11 13:30:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage
0 (PythonRDD[3] at RDD at PythonRDD.scala:43)
>> 14/11/11 13:30:55 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
>> 14/11/11 13:31:10 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered and have sufficient
memory
>> 14/11/11 13:31:14 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.1.222:7077...
>> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:14 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:25 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered and have sufficient
memory
>> 14/11/11 13:31:34 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.1.222:7077...
>> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:34 WARN client.AppClient$ClientActor: Could not connect to akka.tcp://[sparkMaster@192.168.1.222:7077](http://sparkMaster@192.168.1.222:7077):
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.1.222:7077]
>> 14/11/11 13:31:40 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered and have sufficient
memory
>> 14/11/11 13:31:54 ERROR cluster.SparkDeploySchedulerBackend: Application has been
killed. Reason: All masters are unresponsive! Giving up.
>> 14/11/11 13:31:54 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
>> 14/11/11 13:31:54 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
>> 14/11/11 13:31:54 INFO scheduler.DAGScheduler: Failed to run runJob at PythonRDD.scala:296
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
>> 14/11/11 13:31:54 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
>> Traceback (most recent call last):
>>   File "/xxx", line 36, in <module>
>>     model = LogisticRegressionWithSGD.train(trainData)
>>   File "/usr/local/spark/python/pyspark/mllib/classification.py", line 110, in train
>>     initialWeights)
>>   File "/usr/local/spark/python/pyspark/mllib/_common.py", line 430, in _regression_train_wrapper
>>     initial_weights = _get_initial_weights(initial_weights, data)
>>   File "/usr/local/spark/python/pyspark/mllib/_common.py", line 415, in _get_initial_weights
>>     initial_weights = _convert_vector(data.first().features)
>>   File "/usr/local/spark/python/pyspark/rdd.py", line 1167, in first
>>     return self.take(1)[0]
>>   File "/usr/local/spark/python/pyspark/rdd.py", line 1153, in take
>>     res = self.context.runJob(self, takeUpToNumLeft, p, True)
>>   File "/usr/local/spark/python/pyspark/context.py", line 770, in runJob
>>     it = self._jvm.PythonRDD.runJob(self._[jsc.sc](http://jsc.sc)(), mappedRDD._jrdd,
javaPartitions, allowLocal)
>>   File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
538, in __call__
>>   File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
>> py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
>> : org.apache.spark.SparkException: Job aborted due to stage failure: All masters
are unresponsive! Giving up.
>>     at [org.apache.spark.scheduler.DAGScheduler.org](http://org.apache.spark.scheduler.DAGScheduler.org)$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>>     at scala.Option.foreach(Option.scala:236)
>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>     at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> 14/11/11 13:31:54 INFO ui.SparkUI: Stopped Spark web UI at [http://xxxx:4040](http://xxxx:4040)
>> 14/11/11 13:31:54 INFO scheduler.DAGScheduler: Stopping DAGScheduler
>> 14/11/11 13:31:54 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
>>
>> It only works when I use local.
>>
>> On Mon, Nov 10, 2014 at 5:09 PM, Akhil Das <akhil@sigmoidanalytics.com>
>> wrote:
>>
>> Change this to
>>>
>>> spark-submit --master local[8] ~/main/py/file --py-files
>>> ~/some/other/files
>>>
>>> this
>>>
>>> spark-submit --master spark://blurred-part:7077 ~/main/py/file
>>> --py-files ~/some/other/files
>>>
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Nov 10, 2014 at 4:55 PM, Akhil Das <akhil@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> You could be running your application in *local* mode. In the
>>>> application specify the master as spark://blurred-part:7077 and then it
>>>> will appear in the running list.
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Mon, Nov 10, 2014 at 4:25 PM, Samarth Mailinglist <
>>>> mailinglistsamarth@gmail.com> wrote:
>>>>
>>>>> There are no applications being shown in the dashboard (I am attaching
>>>>> a screenshot):
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> This is my spark-env.sh:
>>>>>
>>>>> SPARK_MASTER_WEBUI_PORT=8888
>>>>>
>>>>> SPARK_WORKER_INSTANCES=8 #to set the number of worker processes per node
>>>>>
>>>>> SPARK_HISTORY_OPTS=" -Dspark.history.fs.logDirectory=/usr/local/spark/history-logs/"
#, to set config properties only for the history server (e.g. "-Dx=y")
>>>>>
>>>>> I have started the history server too..
>>>>> ​
>>>>>
>>>>
>>>>
>>>  ​
>>
>
>

Mime
View raw message