spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Nastetsky <alex.nastet...@vervemobile.com>
Subject foreachPartition
Date Fri, 30 Oct 2015 23:42:11 GMT
I'm just trying to do some operation inside foreachPartition, but I can't
even get a simple println to work. Nothing gets printed.

scala> val a = sc.parallelize(List(1,2,3))
a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize
at <console>:21

scala> a.foreachPartition(p => println("foo"))
2015-10-30 23:38:54,643 INFO  [main] spark.SparkContext
(Logging.scala:logInfo(59)) - Starting job: foreachPartition at <console>:24
2015-10-30 23:38:54,644 INFO  [dag-scheduler-event-loop]
scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Got job 9
(foreachPartition at <console>:24) with 3 output partitions
(allowLocal=false)
2015-10-30 23:38:54,644 INFO  [dag-scheduler-event-loop]
scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Final stage:
ResultStage 9(foreachPartition at <console>:24)
2015-10-30 23:38:54,645 INFO  [dag-scheduler-event-loop]
scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Parents of final
stage: List()
2015-10-30 23:38:54,645 INFO  [dag-scheduler-event-loop]
scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Missing parents: List()
2015-10-30 23:38:54,646 INFO  [dag-scheduler-event-loop]
scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting ResultStage
9 (ParallelCollectionRDD[2] at parallelize at <console>:21), which has no
missing parents
2015-10-30 23:38:54,648 INFO  [dag-scheduler-event-loop]
storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(1224)
called with curMem=14486, maxMem=280496701
2015-10-30 23:38:54,649 INFO  [dag-scheduler-event-loop]
storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_9 stored
as values in memory (estimated size 1224.0 B, free 267.5 MB)
2015-10-30 23:38:54,680 INFO  [dag-scheduler-event-loop]
storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(871)
called with curMem=15710, maxMem=280496701
2015-10-30 23:38:54,681 INFO  [dag-scheduler-event-loop]
storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_9_piece0
stored as bytes in memory (estimated size 871.0 B, free 267.5 MB)
2015-10-30 23:38:54,685 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
10.170.11.94:35814 (size: 871.0 B, free: 267.5 MB)
2015-10-30 23:38:54,688 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
10.170.11.94:35814 in memory (size: 2.0 KB, free: 267.5 MB)
2015-10-30 23:38:54,691 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
ip-10-51-144-180.ec2.internal:35111 in memory (size: 2.0 KB, free: 535.0 MB)
2015-10-30 23:38:54,691 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
ip-10-51-144-180.ec2.internal:49833 in memory (size: 2.0 KB, free: 535.0 MB)
2015-10-30 23:38:54,694 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
ip-10-51-144-180.ec2.internal:34776 in memory (size: 2.0 KB, free: 535.0 MB)
2015-10-30 23:38:54,702 INFO  [dag-scheduler-event-loop] spark.SparkContext
(Logging.scala:logInfo(59)) - Created broadcast 9 from broadcast at
DAGScheduler.scala:874
2015-10-30 23:38:54,703 INFO  [dag-scheduler-event-loop]
scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting 3 missing
tasks from ResultStage 9 (ParallelCollectionRDD[2] at parallelize at
<console>:21)
2015-10-30 23:38:54,703 INFO  [dag-scheduler-event-loop]
cluster.YarnScheduler (Logging.scala:logInfo(59)) - Adding task set 9.0
with 3 tasks
2015-10-30 23:38:54,708 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] scheduler.TaskSetManager
(Logging.scala:logInfo(59)) - Starting task 0.0 in stage 9.0 (TID 27,
ip-10-51-144-180.ec2.internal, PROCESS_LOCAL, 1313 bytes)
2015-10-30 23:38:54,711 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] scheduler.TaskSetManager
(Logging.scala:logInfo(59)) - Starting task 1.0 in stage 9.0 (TID 28,
ip-10-51-144-180.ec2.internal, PROCESS_LOCAL, 1313 bytes)
2015-10-30 23:38:54,713 INFO  [sparkDriver-akka.actor.default-dispatcher-2]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Removed
broadcast_5_piece0 on 10.170.11.94:35814 in memory (size: 802.0 B, free:
267.5 MB)
2015-10-30 23:38:54,714 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] scheduler.TaskSetManager
(Logging.scala:logInfo(59)) - Starting task 2.0 in stage 9.0 (TID 29,
ip-10-51-144-180.ec2.internal, PROCESS_LOCAL, 1313 bytes)
2015-10-30 23:38:54,716 INFO  [sparkDriver-akka.actor.default-dispatcher-2]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Removed
broadcast_5_piece0 on ip-10-51-144-180.ec2.internal:34776 in memory (size:
802.0 B, free: 535.0 MB)
2015-10-30 23:38:54,719 INFO  [sparkDriver-akka.actor.default-dispatcher-2]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Removed
broadcast_5_piece0 on ip-10-51-144-180.ec2.internal:35111 in memory (size:
802.0 B, free: 535.0 MB)
2015-10-30 23:38:54,723 INFO  [sparkDriver-akka.actor.default-dispatcher-2]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Removed
broadcast_5_piece0 on ip-10-51-144-180.ec2.internal:49833 in memory (size:
802.0 B, free: 535.0 MB)
2015-10-30 23:38:54,743 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
10.170.11.94:35814 in memory (size: 755.0 B, free: 267.5 MB)
2015-10-30 23:38:54,750 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
ip-10-51-144-180.ec2.internal:35111 (size: 871.0 B, free: 535.0 MB)
2015-10-30 23:38:54,754 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
ip-10-51-144-180.ec2.internal:34776 (size: 871.0 B, free: 535.0 MB)
2015-10-30 23:38:54,756 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
ip-10-51-144-180.ec2.internal:49833 (size: 871.0 B, free: 535.0 MB)
2015-10-30 23:38:54,758 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
ip-10-51-144-180.ec2.internal:35111 in memory (size: 755.0 B, free: 535.0
MB)
2015-10-30 23:38:54,759 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
ip-10-51-144-180.ec2.internal:49833 in memory (size: 755.0 B, free: 535.0
MB)
2015-10-30 23:38:54,760 INFO
 [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
(Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
ip-10-51-144-180.ec2.internal:34776 in memory (size: 755.0 B, free: 535.0
MB)
2015-10-30 23:38:54,777 INFO  [task-result-getter-1]
scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 1.0 in
stage 9.0 (TID 28) in 68 ms on ip-10-51-144-180.ec2.internal (1/3)
2015-10-30 23:38:54,783 INFO  [task-result-getter-3]
scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 2.0 in
stage 9.0 (TID 29) in 70 ms on ip-10-51-144-180.ec2.internal (2/3)
2015-10-30 23:38:54,785 INFO  [task-result-getter-2]
scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 0.0 in
stage 9.0 (TID 27) in 81 ms on ip-10-51-144-180.ec2.internal (3/3)
2015-10-30 23:38:54,785 INFO  [task-result-getter-2] cluster.YarnScheduler
(Logging.scala:logInfo(59)) - Removed TaskSet 9.0, whose tasks have all
completed, from pool
2015-10-30 23:38:54,786 INFO  [dag-scheduler-event-loop]
scheduler.DAGScheduler (Logging.scala:logInfo(59)) - ResultStage 9
(foreachPartition at <console>:24) finished in 0.083 s
2015-10-30 23:38:54,786 INFO  [main] scheduler.DAGScheduler
(Logging.scala:logInfo(59)) - Job 9 finished: foreachPartition at
<console>:24, took 0.143089 s

Mime
View raw message