spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Nastetsky <alex.nastet...@vervemobile.com>
Subject Re: foreachPartition
Date Sat, 31 Oct 2015 00:02:53 GMT
Ahh, makes sense. Knew it was going to be something simple. Thanks.

On Fri, Oct 30, 2015 at 7:45 PM, Mark Hamstra <mark@clearstorydata.com>
wrote:

> The closure is sent to and executed an Executor, so you need to be looking
> at the stdout of the Executors, not on the Driver.
>
> On Fri, Oct 30, 2015 at 4:42 PM, Alex Nastetsky <
> alex.nastetsky@vervemobile.com> wrote:
>
>> I'm just trying to do some operation inside foreachPartition, but I can't
>> even get a simple println to work. Nothing gets printed.
>>
>> scala> val a = sc.parallelize(List(1,2,3))
>> a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at
>> parallelize at <console>:21
>>
>> scala> a.foreachPartition(p => println("foo"))
>> 2015-10-30 23:38:54,643 INFO  [main] spark.SparkContext
>> (Logging.scala:logInfo(59)) - Starting job: foreachPartition at <console>:24
>> 2015-10-30 23:38:54,644 INFO  [dag-scheduler-event-loop]
>> scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Got job 9
>> (foreachPartition at <console>:24) with 3 output partitions
>> (allowLocal=false)
>> 2015-10-30 23:38:54,644 INFO  [dag-scheduler-event-loop]
>> scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Final stage:
>> ResultStage 9(foreachPartition at <console>:24)
>> 2015-10-30 23:38:54,645 INFO  [dag-scheduler-event-loop]
>> scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Parents of final
>> stage: List()
>> 2015-10-30 23:38:54,645 INFO  [dag-scheduler-event-loop]
>> scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Missing parents: List()
>> 2015-10-30 23:38:54,646 INFO  [dag-scheduler-event-loop]
>> scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting ResultStage
>> 9 (ParallelCollectionRDD[2] at parallelize at <console>:21), which has no
>> missing parents
>> 2015-10-30 23:38:54,648 INFO  [dag-scheduler-event-loop]
>> storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(1224)
>> called with curMem=14486, maxMem=280496701
>> 2015-10-30 23:38:54,649 INFO  [dag-scheduler-event-loop]
>> storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_9 stored
>> as values in memory (estimated size 1224.0 B, free 267.5 MB)
>> 2015-10-30 23:38:54,680 INFO  [dag-scheduler-event-loop]
>> storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(871)
>> called with curMem=15710, maxMem=280496701
>> 2015-10-30 23:38:54,681 INFO  [dag-scheduler-event-loop]
>> storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_9_piece0
>> stored as bytes in memory (estimated size 871.0 B, free 267.5 MB)
>> 2015-10-30 23:38:54,685 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
>> 10.170.11.94:35814 (size: 871.0 B, free: 267.5 MB)
>> 2015-10-30 23:38:54,688 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
>> 10.170.11.94:35814 in memory (size: 2.0 KB, free: 267.5 MB)
>> 2015-10-30 23:38:54,691 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
>> ip-10-51-144-180.ec2.internal:35111 in memory (size: 2.0 KB, free: 535.0 MB)
>> 2015-10-30 23:38:54,691 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
>> ip-10-51-144-180.ec2.internal:49833 in memory (size: 2.0 KB, free: 535.0 MB)
>> 2015-10-30 23:38:54,694 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_4_piece0 on
>> ip-10-51-144-180.ec2.internal:34776 in memory (size: 2.0 KB, free: 535.0 MB)
>> 2015-10-30 23:38:54,702 INFO  [dag-scheduler-event-loop]
>> spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 9 from
>> broadcast at DAGScheduler.scala:874
>> 2015-10-30 23:38:54,703 INFO  [dag-scheduler-event-loop]
>> scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting 3 missing
>> tasks from ResultStage 9 (ParallelCollectionRDD[2] at parallelize at
>> <console>:21)
>> 2015-10-30 23:38:54,703 INFO  [dag-scheduler-event-loop]
>> cluster.YarnScheduler (Logging.scala:logInfo(59)) - Adding task set 9.0
>> with 3 tasks
>> 2015-10-30 23:38:54,708 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] scheduler.TaskSetManager
>> (Logging.scala:logInfo(59)) - Starting task 0.0 in stage 9.0 (TID 27,
>> ip-10-51-144-180.ec2.internal, PROCESS_LOCAL, 1313 bytes)
>> 2015-10-30 23:38:54,711 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] scheduler.TaskSetManager
>> (Logging.scala:logInfo(59)) - Starting task 1.0 in stage 9.0 (TID 28,
>> ip-10-51-144-180.ec2.internal, PROCESS_LOCAL, 1313 bytes)
>> 2015-10-30 23:38:54,713 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_5_piece0 on
>> 10.170.11.94:35814 in memory (size: 802.0 B, free: 267.5 MB)
>> 2015-10-30 23:38:54,714 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] scheduler.TaskSetManager
>> (Logging.scala:logInfo(59)) - Starting task 2.0 in stage 9.0 (TID 29,
>> ip-10-51-144-180.ec2.internal, PROCESS_LOCAL, 1313 bytes)
>> 2015-10-30 23:38:54,716 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_5_piece0 on
>> ip-10-51-144-180.ec2.internal:34776 in memory (size: 802.0 B, free: 535.0
>> MB)
>> 2015-10-30 23:38:54,719 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_5_piece0 on
>> ip-10-51-144-180.ec2.internal:35111 in memory (size: 802.0 B, free: 535.0
>> MB)
>> 2015-10-30 23:38:54,723 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_5_piece0 on
>> ip-10-51-144-180.ec2.internal:49833 in memory (size: 802.0 B, free: 535.0
>> MB)
>> 2015-10-30 23:38:54,743 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
>> 10.170.11.94:35814 in memory (size: 755.0 B, free: 267.5 MB)
>> 2015-10-30 23:38:54,750 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
>> ip-10-51-144-180.ec2.internal:35111 (size: 871.0 B, free: 535.0 MB)
>> 2015-10-30 23:38:54,754 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
>> ip-10-51-144-180.ec2.internal:34776 (size: 871.0 B, free: 535.0 MB)
>> 2015-10-30 23:38:54,756 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Added broadcast_9_piece0 in memory on
>> ip-10-51-144-180.ec2.internal:49833 (size: 871.0 B, free: 535.0 MB)
>> 2015-10-30 23:38:54,758 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
>> ip-10-51-144-180.ec2.internal:35111 in memory (size: 755.0 B, free: 535.0
>> MB)
>> 2015-10-30 23:38:54,759 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
>> ip-10-51-144-180.ec2.internal:49833 in memory (size: 755.0 B, free: 535.0
>> MB)
>> 2015-10-30 23:38:54,760 INFO
>>  [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerInfo
>> (Logging.scala:logInfo(59)) - Removed broadcast_6_piece0 on
>> ip-10-51-144-180.ec2.internal:34776 in memory (size: 755.0 B, free: 535.0
>> MB)
>> 2015-10-30 23:38:54,777 INFO  [task-result-getter-1]
>> scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 1.0 in
>> stage 9.0 (TID 28) in 68 ms on ip-10-51-144-180.ec2.internal (1/3)
>> 2015-10-30 23:38:54,783 INFO  [task-result-getter-3]
>> scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 2.0 in
>> stage 9.0 (TID 29) in 70 ms on ip-10-51-144-180.ec2.internal (2/3)
>> 2015-10-30 23:38:54,785 INFO  [task-result-getter-2]
>> scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 0.0 in
>> stage 9.0 (TID 27) in 81 ms on ip-10-51-144-180.ec2.internal (3/3)
>> 2015-10-30 23:38:54,785 INFO  [task-result-getter-2]
>> cluster.YarnScheduler (Logging.scala:logInfo(59)) - Removed TaskSet 9.0,
>> whose tasks have all completed, from pool
>> 2015-10-30 23:38:54,786 INFO  [dag-scheduler-event-loop]
>> scheduler.DAGScheduler (Logging.scala:logInfo(59)) - ResultStage 9
>> (foreachPartition at <console>:24) finished in 0.083 s
>> 2015-10-30 23:38:54,786 INFO  [main] scheduler.DAGScheduler
>> (Logging.scala:logInfo(59)) - Job 9 finished: foreachPartition at
>> <console>:24, took 0.143089 s
>>
>>
>

Mime
View raw message