spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lorraine d almeida <lorrainedalme...@gmail.com>
Subject Array Index out of bounds exception in Bagel program
Date Thu, 05 Sep 2013 10:02:05 GMT
Hi

I tried to run the program WikipediaPageRankStandalone from examples
directory of Spark. But am getting the following error.Please help out.

hduser@vm4:~/spark-test/spark-0.7.2$ ./run
spark.bagel.examples.WikipediaPageRank pagerank_data.txt 1 3
spark://vm4:7077 true
13/09/05 15:29:11 WARN spark.Utils: Your hostname, vm4 resolves to a
loopback address: 127.0.1.1; using 192.168.0.50 instead (on interface eth0)
13/09/05 15:29:11 WARN spark.Utils: Set SPARK_LOCAL_IP if you need to bind
to another address
13/09/05 15:29:12 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/09/05 15:29:13 INFO spark.SparkEnv: Registering BlockManagerMaster
13/09/05 15:29:13 INFO storage.MemoryStore: MemoryStore started with
capacity 326.7 MB.
13/09/05 15:29:13 INFO storage.DiskStore: Created local directory at
/tmp/spark-local-20130905152913-c316
13/09/05 15:29:13 INFO network.ConnectionManager: Bound socket to port
42229 with id = ConnectionManagerId(vm4,42229)
13/09/05 15:29:13 INFO storage.BlockManagerMaster: Trying to register
BlockManager
13/09/05 15:29:13 INFO storage.BlockManagerMaster: Registered BlockManager
13/09/05 15:29:13 INFO server.Server: jetty-7.6.8.v20121106
13/09/05 15:29:13 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:55371
13/09/05 15:29:13 INFO broadcast.HttpBroadcast: Broadcast server started at
http://192.168.0.50:55371
13/09/05 15:29:13 INFO spark.SparkEnv: Registering MapOutputTracker
13/09/05 15:29:13 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-7c765df4-09bf-492e-a4e3-df87e4f3c0bc
13/09/05 15:29:13 INFO server.Server: jetty-7.6.8.v20121106
13/09/05 15:29:13 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:40848
13/09/05 15:29:13 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0'
started
13/09/05 15:29:14 INFO server.HttpServer:
akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:39257
13/09/05 15:29:14 INFO storage.BlockManagerUI: Started BlockManager web UI
at http://vm4:39257
13/09/05 15:29:14 INFO client.Client$ClientActor: Connecting to master
spark://vm4:7077
13/09/05 15:29:15 INFO cluster.SparkDeploySchedulerBackend: Connected to
Spark cluster with app ID app-20130905152914-0004
13/09/05 15:29:15 INFO client.Client$ClientActor: Executor added:
app-20130905152914-0004/0 on worker-20130905152521-vm4-59380 (vm4) with 1
cores
13/09/05 15:29:15 INFO cluster.SparkDeploySchedulerBackend: Granted
executor ID app-20130905152914-0004/0 on host vm4 with 1 cores, 512.0 MB RAM
13/09/05 15:29:15 INFO client.Client$ClientActor: Executor updated:
app-20130905152914-0004/0 is now RUNNING
13/09/05 15:29:16 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
13/09/05 15:29:17 INFO storage.MemoryStore: ensureFreeSpace(123002) called
with curMem=0, maxMem=342526525
13/09/05 15:29:17 INFO storage.MemoryStore: Block broadcast_0 stored as
values to memory (estimated size 120.1 KB, free 326.5 MB)
13/09/05 15:29:18 INFO spark.KryoSerializer: Running user registrator:
spark.bagel.examples.PRKryoRegistrator
Counting vertices...
13/09/05 15:29:18 INFO mapred.FileInputFormat: Total input paths to process
: 1
13/09/05 15:29:18 INFO spark.SparkContext: Starting job: main at <unknown>:0
13/09/05 15:29:18 INFO scheduler.DAGScheduler: Got job 0 (main at
<unknown>:0) with 2 output partitions (allowLocal=false)
13/09/05 15:29:18 INFO scheduler.DAGScheduler: Final stage: Stage 0 (main
at <unknown>:0)
13/09/05 15:29:18 INFO scheduler.DAGScheduler: Parents of final stage:
List()
13/09/05 15:29:18 INFO scheduler.DAGScheduler: Missing parents: List()
13/09/05 15:29:18 INFO scheduler.DAGScheduler: Submitting Stage 0
(MappedRDD[1] at main at <unknown>:0), which has no missing parents
13/09/05 15:29:18 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 0 (MappedRDD[1] at main at <unknown>:0)
13/09/05 15:29:18 INFO cluster.ClusterScheduler: Adding task set 0.0 with 2
tasks
13/09/05 15:29:19 INFO cluster.SparkDeploySchedulerBackend: Registered
executor: Actor[akka://sparkExecutor@vm4:46884/user/Executor] with ID 0
13/09/05 15:29:19 INFO cluster.TaskSetManager: Starting task 0.0:0 as TID 0
on executor 0: vm4 (preferred)
13/09/05 15:29:19 INFO cluster.TaskSetManager: Serialized task 0.0:0 as
1477 bytes in 70 ms
13/09/05 15:29:19 INFO storage.BlockManagerMasterActor$BlockManagerInfo:
Registering block manager vm4:59614 with 326.7 MB RAM
13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 0 in 2583 ms
(progress: 1/2)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 0.0:1 as TID 1
on executor 0: vm4 (preferred)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 0.0:1 as
1477 bytes in 0 ms
13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 1 in 120 ms
(progress: 2/2)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Stage 0 (main at
<unknown>:0) finished in 3.750 s
13/09/05 15:29:22 INFO spark.SparkContext: Job finished: main at
<unknown>:0, took 3.848063895 s
Done counting vertices.
Parsing input file...
Done parsing input file.
13/09/05 15:29:22 INFO bagel.Bagel: Starting superstep 0.
13/09/05 15:29:22 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with
MapPartitionsRDD[7] at main at <unknown>:0
13/09/05 15:29:22 INFO rdd.CoGroupedRDD: Adding shuffle dependency with
ShuffledRDD[3] at main at <unknown>:0
13/09/05 15:29:22 INFO spark.SparkContext: Starting job: main at <unknown>:0
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Registering RDD 5 (main at
<unknown>:0)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Registering RDD 11 (apply at
TraversableLike.scala:233)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Registering RDD 2 (main at
<unknown>:0)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Got job 1 (main at
<unknown>:0) with 3 output partitions (allowLocal=false)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Final stage: Stage 1 (main
at <unknown>:0)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Parents of final stage:
List(Stage 2, Stage 3)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Missing parents: List(Stage
2, Stage 3)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting Stage 2
(MapPartitionsRDD[5] at main at <unknown>:0), which has no missing parents
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 2 (MapPartitionsRDD[5] at main at <unknown>:0)
13/09/05 15:29:22 INFO cluster.ClusterScheduler: Adding task set 2.0 with 2
tasks
13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 2.0:0 as TID 2
on executor 0: vm4 (preferred)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting Stage 4
(MappedRDD[2] at main at <unknown>:0), which has no missing parents
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 4 (MappedRDD[2] at main at <unknown>:0)
13/09/05 15:29:22 INFO cluster.ClusterScheduler: Adding task set 4.0 with 2
tasks
13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 2.0:0 as
1713 bytes in 54 ms
13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 2 in 193 ms
(progress: 1/2)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2,
0)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 2.0:1 as TID 3
on executor 0: vm4 (preferred)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 2.0:1 as
1713 bytes in 0 ms
13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 3 in 51 ms
(progress: 2/2)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 4
on executor 0: vm4 (preferred)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2,
1)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Stage 2 (main at
<unknown>:0) finished in 0.261 s
13/09/05 15:29:22 INFO scheduler.DAGScheduler: looking for newly runnable
stages
13/09/05 15:29:22 INFO scheduler.DAGScheduler: running: Set(Stage 4)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: waiting: Set(Stage 1, Stage
3)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: failed: Set()
13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 4.0:0 as
1676 bytes in 22 ms
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Missing parents for Stage 1:
List(Stage 3)
13/09/05 15:29:22 INFO scheduler.DAGScheduler: Missing parents for Stage 3:
List(Stage 4)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Lost TID 4 (task 4.0:0)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Loss was due to
java.lang.ArrayIndexOutOfBoundsException: 3
    at
spark.bagel.examples.WikipediaPageRank$$anonfun$1.apply(WikipediaPageRank.scala:43)
    at
spark.bagel.examples.WikipediaPageRank$$anonfun$1.apply(WikipediaPageRank.scala:41)
    at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
    at scala.collection.Iterator$class.foreach(Iterator.scala:772)
    at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
    at spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:127)
    at spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:75)
    at spark.executor.Executor$TaskRunner.run(Executor.scala:98)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 5
on executor 0: vm4 (preferred)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 4.0:0 as
1676 bytes in 0 ms
13/09/05 15:29:22 INFO cluster.TaskSetManager: Lost TID 5 (task 4.0:0)
13/09/05 15:29:22 INFO cluster.TaskSetManager: Loss was due to
java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 1]
13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 6
on executor 0: vm4 (preferred)
13/09/05 15:29:23 INFO cluster.TaskSetManager: Serialized task 4.0:0 as
1676 bytes in 7 ms
13/09/05 15:29:23 INFO cluster.TaskSetManager: Lost TID 6 (task 4.0:0)
13/09/05 15:29:23 INFO cluster.TaskSetManager: Loss was due to
java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 2]
13/09/05 15:29:23 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 7
on executor 0: vm4 (preferred)
13/09/05 15:29:23 INFO cluster.TaskSetManager: Serialized task 4.0:0 as
1676 bytes in 0 ms
13/09/05 15:29:23 INFO cluster.TaskSetManager: Lost TID 7 (task 4.0:0)
13/09/05 15:29:23 INFO cluster.TaskSetManager: Loss was due to
java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 3]
13/09/05 15:29:23 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 8
on executor 0: vm4 (preferred)
13/09/05 15:29:23 INFO cluster.TaskSetManager: Serialized task 4.0:0 as
1676 bytes in 0 ms
13/09/05 15:29:23 INFO cluster.TaskSetManager: Lost TID 8 (task 4.0:0)
13/09/05 15:29:23 INFO cluster.TaskSetManager: Loss was due to
java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 4]
13/09/05 15:29:23 ERROR cluster.TaskSetManager: Task 4.0:0 failed more than
4 times; aborting job
13/09/05 15:29:23 INFO scheduler.DAGScheduler: Failed to run main at
<unknown>:0
Exception in thread "main" spark.SparkException: Job failed: Task 4.0:0
failed more than 4 times
    at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
    at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
    at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:303)
    at
spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)
    at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)

Mime
View raw message