spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsai Li Ming <mailingl...@ltsai.com>
Subject Understanding stages in WebUI
Date Tue, 25 Nov 2014 08:22:40 GMT
Hi,

I have the classic word count example:
> file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_ + _).collect()

From the Job UI, I can only see 2 stages: 0-collect and 1-map.

What happened to ShuffledRDD in reduceByKey? And both flatMap and map operations is collapsed
into a single stage?

14/11/25 16:02:35 INFO SparkContext: Starting job: collect at <console>:15
14/11/25 16:02:35 INFO DAGScheduler: Registering RDD 6 (map at <console>:15)
14/11/25 16:02:35 INFO DAGScheduler: Got job 0 (collect at <console>:15) with 2 output
partitions (allowLocal=false)
14/11/25 16:02:35 INFO DAGScheduler: Final stage: Stage 0(collect at <console>:15)
14/11/25 16:02:35 INFO DAGScheduler: Parents of final stage: List(Stage 1)
14/11/25 16:02:35 INFO DAGScheduler: Missing parents: List(Stage 1)
14/11/25 16:02:35 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[6] at map at <console>:15),
which has no missing parents
14/11/25 16:02:35 INFO MemoryStore: ensureFreeSpace(3464) called with curMem=163705, maxMem=278302556
14/11/25 16:02:35 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated
size 3.4 KB, free 265.3 MB)
14/11/25 16:02:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[6]
at map at <console>:15)
14/11/25 16:02:35 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/11/25 16:02:35 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL,
1208 bytes)
14/11/25 16:02:35 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL,
1208 bytes)
14/11/25 16:02:35 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
14/11/25 16:02:35 INFO Executor: Running task 1.0 in stage 1.0 (TID 1)
14/11/25 16:02:35 INFO HadoopRDD: Input split: file:/Users/ltsai/Downloads/spark-1.1.0-bin-hadoop2.4/README.md:0+2405
14/11/25 16:02:35 INFO HadoopRDD: Input split: file:/Users/ltsai/Downloads/spark-1.1.0-bin-hadoop2.4/README.md:2405+2406
14/11/25 16:02:35 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/11/25 16:02:35 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/11/25 16:02:35 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/11/25 16:02:35 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/11/25 16:02:35 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
14/11/25 16:02:36 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 1869 bytes result
sent to driver
14/11/25 16:02:36 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 1869 bytes result
sent to driver
14/11/25 16:02:36 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 536 ms on
localhost (1/2)
14/11/25 16:02:36 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 529 ms on
localhost (2/2)
14/11/25 16:02:36 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed,
from pool 
14/11/25 16:02:36 INFO DAGScheduler: Stage 1 (map at <console>:15) finished in 0.562
s
14/11/25 16:02:36 INFO DAGScheduler: looking for newly runnable stages
14/11/25 16:02:36 INFO DAGScheduler: running: Set()
14/11/25 16:02:36 INFO DAGScheduler: waiting: Set(Stage 0)
14/11/25 16:02:36 INFO DAGScheduler: failed: Set()
14/11/25 16:02:36 INFO DAGScheduler: Missing parents for Stage 0: List()
14/11/25 16:02:36 INFO DAGScheduler: Submitting Stage 0 (ShuffledRDD[7] at reduceByKey at
<console>:15), which is now runnable
14/11/25 16:02:36 INFO MemoryStore: ensureFreeSpace(2112) called with curMem=167169, maxMem=278302556
14/11/25 16:02:36 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated
size 2.1 KB, free 265.2 MB)
14/11/25 16:02:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[7]
at reduceByKey at <console>:15)
14/11/25 16:02:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/11/25 16:02:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL,
948 bytes)
14/11/25 16:02:36 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL,
948 bytes)
14/11/25 16:02:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 2)
14/11/25 16:02:36 INFO Executor: Running task 1.0 in stage 0.0 (TID 3)
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648,
targetRequestSize: 10066329
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648,
targetRequestSize: 10066329
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty
blocks out of 2 blocks
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty
blocks out of 2 blocks
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches
in 5 ms
14/11/25 16:02:36 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches
in 5 ms
14/11/25 16:02:36 INFO Executor: Finished task 0.0 in stage 0.0 (TID 2). 4602 bytes result
sent to driver
14/11/25 16:02:36 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 135 ms on
localhost (1/2)
14/11/25 16:02:36 INFO Executor: Finished task 1.0 in stage 0.0 (TID 3). 5051 bytes result
sent to driver
14/11/25 16:02:36 INFO DAGScheduler: Stage 0 (collect at <console>:15) finished in 0.168
s
14/11/25 16:02:36 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 160 ms on
localhost (2/2)
14/11/25 16:02:36 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed,
from pool 
14/11/25 16:02:36 INFO SparkContext: Job finished: collect at <console>:15, took 1.046888
s

Thanks!


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message