spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Spark not working with mesos
Date Tue, 03 Jun 2014 14:55:59 GMT
1. Make sure your spark-*.tgz that you created by make_distribution.sh is
accessible by all the slaves nodes.

2. Check the worker node logs.



Thanks
Best Regards


On Tue, Jun 3, 2014 at 8:13 PM, praveshjain1991 <praveshjain1991@gmail.com>
wrote:

> I set up Spark-0.9.1 to run on mesos-0.13.0 using the steps mentioned  here
> <https://spark.apache.org/docs/0.9.1/running-on-mesos.html>  . The Mesos
> UI
> is showing two workers registered. I want to run these commands on
> Spark-shell
>
> > scala> val data = 1 to 10000 data:
> > scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6,
> > 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
> > 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
> > 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
> > 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
> > 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
> > 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
> > 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
> > 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135,
> > 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
> > 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163,
> > 164, 165, 166, 167, 168, 169, 170...
>
>
> > scala> val distData = sc.parallelize(data) distData:
> > org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
> > parallelize at <console>:14
>
> Now when i run the collect method, the following error occurs.
>
> > scala> distData.filter(_< 10).collect()
> 14/06/03 19:54:55 INFO SparkContext: Starting job: collect at <console>:17
> 14/06/03 19:54:55 INFO DAGScheduler: Got job 0 (collect at <console>:17)
> with 8 output partitions (allowLocal=false)
> 14/06/03 19:54:55 INFO DAGScheduler: Final stage: Stage 0 (collect at
> <console>:17)
> 14/06/03 19:54:55 INFO DAGScheduler: Parents of final stage: List()
> 14/06/03 19:54:55 INFO DAGScheduler: Missing parents: List()
> 14/06/03 19:54:55 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[1] at
> filter at <console>:17), which has no missing parents
> 14/06/03 19:54:55 INFO DAGScheduler: Submitting 8 missing tasks from Stage
> 0
> (FilteredRDD[1] at filter at <console>:17)
> 14/06/03 19:54:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 8 tasks
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:0 as 1338 bytes
> in 8 ms
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:1 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:2 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes
> in 1 ms
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:4 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:5 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:6 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:7 as TID 7 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:56 INFO TaskSetManager: Re-queueing tasks for
> 201406031732-3213994176-5050-6320-10 from TaskSet 0.0
> 14/06/03 19:54:56 WARN TaskSetManager: Lost TID 5 (task 0.0:5)
> 14/06/03 19:54:56 WARN TaskSetManager: Lost TID 7 (task 0.0:7)
> 14/06/03 19:54:56 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
> 14/06/03 19:54:56 WARN TaskSetManager: Lost TID 3 (task 0.0:3)
> 14/06/03 19:54:56 INFO DAGScheduler: Executor lost:
> 201406031732-3213994176-5050-6320-10 (epoch 0)
> 14/06/03 19:54:56 INFO BlockManagerMasterActor: Trying to remove executor
> 201406031732-3213994176-5050-6320-10 from BlockManagerMaster.
> 14/06/03 19:54:56 INFO BlockManagerMaster: Removed
> 201406031732-3213994176-5050-6320-10 successfully in removeExecutor
> 14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:3 as TID 8 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:56 INFO DAGScheduler: Host gained which was in lost list
> earlier: IMPETUS-DSRV04.impetus.co.in
> 14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:1 as TID 9 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:1 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:7 as TID 10 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:5 as TID 11 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:5 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Re-queueing tasks for
> 201406031732-3213994176-5050-6320-11 from TaskSet 0.0
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 8 (task 0.0:3)
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 2 (task 0.0:2)
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 4 (task 0.0:4)
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 10 (task 0.0:7)
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 6 (task 0.0:6)
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/06/03 19:54:57 INFO DAGScheduler: Executor lost:
> 201406031732-3213994176-5050-6320-11 (epoch 1)
> 14/06/03 19:54:57 INFO BlockManagerMasterActor: Trying to remove executor
> 201406031732-3213994176-5050-6320-11 from BlockManagerMaster.
> 14/06/03 19:54:57 INFO BlockManagerMaster: Removed
> 201406031732-3213994176-5050-6320-11 successfully in removeExecutor
> 14/06/03 19:54:57 INFO DAGScheduler: Host gained which was in lost list
> earlier: IMPETUS-DSRV05.impetus.co.in
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:0 as TID 12 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:0 as 1338 bytes
> in 1 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:6 as TID 13 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:6 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:7 as TID 14 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes
> in 1 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:4 as TID 15 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:4 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:2 as TID 16 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:2 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:3 as TID 17 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes
> in 1 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Re-queueing tasks for
> 201406031732-3213994176-5050-6320-11 from TaskSet 0.0
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 14 (task 0.0:7)
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 16 (task 0.0:2)
> 14/06/03 19:54:57 WARN TaskSetManager: Lost TID 12 (task 0.0:0)
> 14/06/03 19:54:57 INFO DAGScheduler: Executor lost:
> 201406031732-3213994176-5050-6320-11 (epoch 2)
> 14/06/03 19:54:57 INFO BlockManagerMasterActor: Trying to remove executor
> 201406031732-3213994176-5050-6320-11 from BlockManagerMaster.
> 14/06/03 19:54:57 INFO BlockManagerMaster: Removed
> 201406031732-3213994176-5050-6320-11 successfully in removeExecutor
> 14/06/03 19:54:57 INFO DAGScheduler: Host gained which was in lost list
> earlier: IMPETUS-DSRV05.impetus.co.in
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:0 as TID 18 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:0 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:2 as TID 19 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:2 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:7 as TID 20 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:58 INFO TaskSetManager: Re-queueing tasks for
> 201406031732-3213994176-5050-6320-10 from TaskSet 0.0
> 14/06/03 19:54:58 WARN TaskSetManager: Lost TID 17 (task 0.0:3)
> 14/06/03 19:54:58 WARN TaskSetManager: Lost TID 11 (task 0.0:5)
> 14/06/03 19:54:58 WARN TaskSetManager: Lost TID 13 (task 0.0:6)
> 14/06/03 19:54:58 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
> 14/06/03 19:54:58 WARN TaskSetManager: Lost TID 15 (task 0.0:4)
> 14/06/03 19:54:58 INFO DAGScheduler: Executor lost:
> 201406031732-3213994176-5050-6320-10 (epoch 3)
> 14/06/03 19:54:58 INFO BlockManagerMasterActor: Trying to remove executor
> 201406031732-3213994176-5050-6320-10 from BlockManagerMaster.
> 14/06/03 19:54:58 INFO BlockManagerMaster: Removed
> 201406031732-3213994176-5050-6320-10 successfully in removeExecutor
> 14/06/03 19:54:58 INFO DAGScheduler: Host gained which was in lost list
> earlier: IMPETUS-DSRV04.impetus.co.in
> 14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:4 as TID 21 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:4 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:1 as TID 22 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:1 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:6 as TID 23 on
> executor 201406031732-3213994176-5050-6320-11:
> IMPETUS-DSRV05.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:6 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:5 as TID 24 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:5 as 1338 bytes
> in 1 ms
> 14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:3 as TID 25 on
> executor 201406031732-3213994176-5050-6320-10:
> IMPETUS-DSRV04.impetus.co.in
> (PROCESS_LOCAL)
> 14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes
> in 0 ms
> 14/06/03 19:54:59 INFO TaskSetManager: Re-queueing tasks for
> 201406031732-3213994176-5050-6320-11 from TaskSet 0.0
> 14/06/03 19:54:59 WARN TaskSetManager: Lost TID 23 (task 0.0:6)
> 14/06/03 19:54:59 WARN TaskSetManager: Lost TID 20 (task 0.0:7)
> 14/06/03 19:54:59 ERROR TaskSetManager: Task 0.0:7 failed 4 times; aborting
> job
> 14/06/03 19:54:59 INFO DAGScheduler: Failed to run collect at <console>:17
> 14/06/03 19:54:59 INFO DAGScheduler: Executor lost:
> 201406031732-3213994176-5050-6320-11 (epoch 4)
> 14/06/03 19:54:59 INFO BlockManagerMasterActor: Trying to remove executor
> 201406031732-3213994176-5050-6320-11 from BlockManagerMaster.
> 14/06/03 19:54:59 INFO BlockManagerMaster: Removed
> 201406031732-3213994176-5050-6320-11 successfully in removeExecutor
> 14/06/03 19:54:59 INFO DAGScheduler: Host gained which was in lost list
> earlier: IMPETUS-DSRV05.impetus.co.in
> org.apache.spark.SparkException: Job aborted: Task 0.0:7 failed 4 times
> (most recent failure: unknown)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>         at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>         at scala.Option.foreach(Option.scala:236)
>         at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >
> >
> > scala> 14/06/03 19:55:00 INFO TaskSetManager: Re-queueing tasks for
> > 201406031732-3213994176-5050-6320-10 from TaskSet 0.0 14/06/03
> > 19:55:00 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have
> > all completed, from pool 14/06/03 19:55:00 INFO DAGScheduler: Executor
> > lost: 201406031732-3213994176-5050-6320-10 (epoch 5) 14/06/03 19:55:00
> > INFO BlockManagerMasterActor: Trying to remove executor
> > 201406031732-3213994176-5050-6320-10 from BlockManagerMaster. 14/06/03
> > 19:55:00 INFO BlockManagerMaster: Removed
> > 201406031732-3213994176-5050-6320-10 successfully in removeExecutor
> > 14/06/03 19:55:00 INFO DAGScheduler: Host gained which was in lost
> > list earlier: IMPETUS-DSRV04.impetus.co.in
>
> I've checked my configuration of spark many times and it looks fine to me.
> Any ideas what might have gone wrong?
>
> --
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-not-working-with-mesos-tp6806.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message