spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: mesos in spark 2.0.1 - must call stop() otherwise app hangs
Date Wed, 05 Oct 2016 17:10:24 GMT
Being discussed as we speak at
https://issues.apache.org/jira/browse/SPARK-17707

Calling stop() is definitely the right thing to do and always has been (see
examples), but, may be possible to get rid of the new non-daemon thread
preventing shutdown to make it possible to still  get away without it.

On Wed, Oct 5, 2016 at 6:06 PM Adrian Bridgett <adrian@opensignal.com>
wrote:

> I saw this originally with 2.0.0 but as 2.0.1 is freshly out I thought
> I'd retry.
>
> With pyspark apps, it seems that in 2.x you must call .stop() on the
> spark context as otherwise the application doesn't stop (see log below -
> first gap is when it finishes, second gap is when I hit ^C).
>
> Thought I'd mention this in case a) it's a bug (certainly changed
> behaviour since 1.6.1) and b) anyone else hits it.
>
>
> Unique devices= 1243123123123
> 16/10/05 16:52:30 INFO mesos.MesosCoarseGrainedSchedulerBackend: Capping
> the total amount of executors to 0
>
> (hangs here - spark UI still running - show all executors there)
> (after a while dynamic allocation kicks in which cleans up a little)
>
>
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_7_piece0 on ip-10-1-202-125.ec2.internal:18594 in memory
> (size: 3.7 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_7_piece0 on 10.1.200.212:38722 in memory (size: 3.7 KB, free:
> 912.3 MB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on 10.1.200.212:38722 in memory (size: 26.2 KB, free:
> 912.3 MB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-202-125.ec2.internal:18594 in memory
> (size: 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-201-35.ec2.internal:31353 in memory (size:
> 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-200-127.ec2.internal:11645 in memory
> (size: 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-200-126.ec2.internal:17810 in memory
> (size: 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-201-190.ec2.internal:37033 in memory
> (size: 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-201-8.ec2.internal:14506 in memory (size:
> 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-200-128.ec2.internal:36370 in memory
> (size: 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-202-128.ec2.internal:34654 in memory
> (size: 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_4_piece0 on ip-10-1-202-127.ec2.internal:28749 in memory
> (size: 26.2 KB, free: 10.5 GB)
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3471
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3472
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3473
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3474
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3475
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3476
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3477
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3478
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3479
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3480
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3481
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3482
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3483
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3484
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3485
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3486
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3487
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3488
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3489
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3490
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3491
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3492
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned accumulator 3493
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned shuffle 1
> 16/10/05 16:53:21 INFO spark.ContextCleaner: Cleaned shuffle 2
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_6_piece0 on 10.1.200.212:38722 in memory (size: 9.9 KB, free:
> 912.3 MB)
> 16/10/05 16:53:21 INFO storage.BlockManagerInfo: Removed
> broadcast_6_piece0 on ip-10-1-202-125.ec2.internal:18594 in memory
> (size: 9.9 KB, free: 10.5 GB)
>
> (but app is still hanging around - not sure what the import sys is doing
> here, but I've left it in case it explains anything!)
>
>
>
>
> ▽
> import sys
>
>
> ▽
> import sys
> (and here is the ^C)
> ^C16/10/05 16:55:21 INFO spark.SparkContext: Invoking stop() from
> shutdown hook
> 16/10/05 16:55:21 INFO server.ServerConnector: Stopped
> ServerConnector@70087549{HTTP/1.1}{0.0.0.0:4040}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@22516b56
> {/stages/stage/kill,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@6b5c5804{/api,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@98268{/,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@59865b04{/static,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@5a5272d3
> {/executors/threadDump/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@1ade2c09
> {/executors/threadDump,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@7ff4c458{/executors/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@667cb568{/executors,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@60441110{/environment/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@39cb8097{/environment,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@16b1e403{/storage/rdd/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@6f8d2666{/storage/rdd,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@7455a06e{/storage/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@5e0e2b61{/storage,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@5615ea0e{/stages/pool/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@57598349{/stages/pool,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@6a38093f
> {/stages/stage/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@2b9460e5{/stages/stage,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@6c3dbe02{/stages/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@1079b7a{/stages,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@432049a9{/jobs/job/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@6144d55d{/jobs/job,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@7c4232ba{/jobs/json,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@291d6f18{/jobs,null,UNAVAILABLE}
> 16/10/05 16:55:21 INFO ui.SparkUI: Stopped Spark web UI at
> http://10.1.200.212:4040
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend:
> Shutting down all executors
> 16/10/05 16:55:21 INFO
> cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each
> executor to shut down
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 8 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 6 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 7 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 1 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 5 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 2 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 4 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 3 is now TASK_FINISHED
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 0 is now TASK_FINISHED
> I1005 16:55:21.862349 16747 sched.cpp:1903] Asked to stop the driver
> I1005 16:55:21.862777 16390 sched.cpp:1143] Stopping framework
> 'cd7fd4fe-1eff-438c-87f9-348461e6a640-1683'
> 16/10/05 16:55:21 INFO mesos.MesosCoarseGrainedSchedulerBackend:
> driver.run() returned with code DRIVER_STOPPED
> 16/10/05 16:55:21 INFO spark.MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 16/10/05 16:55:21 INFO memory.MemoryStore: MemoryStore cleared
> 16/10/05 16:55:21 INFO storage.BlockManager: BlockManager stopped
> 16/10/05 16:55:21 INFO storage.BlockManagerMaster: BlockManagerMaster
> stopped
> 16/10/05 16:55:21 INFO
> scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 16/10/05 16:55:21 INFO spark.SparkContext: Successfully stopped
> SparkContext
> 16/10/05 16:55:21 INFO util.ShutdownHookManager: Shutdown hook called
> 16/10/05 16:55:21 INFO util.ShutdownHookManager: Deleting directory
>
> /tmp/spark-3d420117-9442-4375-91ac-d54038e2c18b/pyspark-28e7d692-a344-4d0f-9495-3304ccaa520e
> 16/10/05 16:55:21 INFO util.ShutdownHookManager: Deleting directory
> /tmp/spark-3d420117-9442-4375-91ac-d54038e2c18b
>
> And now a run with sc.stop():
> 16/10/05 16:57:52 INFO scheduler.DAGScheduler: Job 2 finished: count at
> NativeMethodAccessorImpl.java:-2, took 12.148737 s
> Unique devices= 1243123123123
> 16/10/05 16:57:53 INFO server.ServerConnector: Stopped
> ServerConnector@6395b5{HTTP/1.1}{0.0.0.0:4040}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@41d00db1
> {/stages/stage/kill,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@6dc9bd9a{/api,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@62c55d05{/,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@69c56749{/static,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@2271d89a
> {/executors/threadDump/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@7a7ebfb8
> {/executors/threadDump,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@4469ec58{/executors/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@231d0c59{/executors,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@352daf52{/environment/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@72798a1f{/environment,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@1def5f95{/storage/rdd/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@697feb2d{/storage/rdd,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@7056f8aa{/storage/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@44056f98{/storage,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@48e1d5a{/stages/pool/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@54d9ad20{/stages/pool,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@3921b8da
> {/stages/stage/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@630479fa{/stages/stage,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@1e69dddc{/stages/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@37eb5fc9{/stages,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@3feca272{/jobs/job/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@fae47cc{/jobs/job,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@59587d7{/jobs/json,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@1fe2fda3{/jobs,null,UNAVAILABLE}
> 16/10/05 16:57:53 INFO ui.SparkUI: Stopped Spark web UI at
> http://10.1.200.212:4040
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Capping
> the total amount of executors to 0
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend:
> Shutting down all executors
> 16/10/05 16:57:53 INFO
> cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each
> executor to shut down
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 6 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 1 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 7 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 8 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 2 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 3 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 4 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 5 is now TASK_FINISHED
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend: Mesos
> task 0 is now TASK_FINISHED
> I1005 16:57:53.946076 17178 sched.cpp:1903] Asked to stop the driver
> I1005 16:57:53.946187 17215 sched.cpp:1143] Stopping framework
> 'cd7fd4fe-1eff-438c-87f9-348461e6a640-1685'
> 16/10/05 16:57:53 INFO mesos.MesosCoarseGrainedSchedulerBackend:
> driver.run() returned with code DRIVER_STOPPED
> 16/10/05 16:57:53 INFO spark.MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 16/10/05 16:57:53 INFO memory.MemoryStore: MemoryStore cleared
> 16/10/05 16:57:53 INFO storage.BlockManager: BlockManager stopped
> 16/10/05 16:57:53 INFO storage.BlockManagerMaster: BlockManagerMaster
> stopped
> 16/10/05 16:57:53 INFO
> scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 16/10/05 16:57:53 INFO spark.SparkContext: Successfully stopped
> SparkContext
> 16/10/05 16:57:54 INFO util.ShutdownHookManager: Shutdown hook called
> 16/10/05 16:57:54 INFO util.ShutdownHookManager: Deleting directory
>
> /tmp/spark-db6e7cc5-21d1-4895-a51f-ad797989d711/pyspark-04c90784-4350-4660-bac8-106fdfc7e6cf
> 16/10/05 16:57:54 INFO util.ShutdownHookManager: Deleting directory
> /tmp/spark-db6e7cc5-21d1-4895-a51f-ad797989d711
>
> Code was this (sorry, C&P job - could kill off the sys of course!)
>
> import sys
> from pyspark.sql import SparkSession
>
> sc= SparkSession\
>          .builder\
>          .appName("spark2test")\
>          .getOrCreate()
>
> df = sc.read.load('/prod/data/core/filtered/parquet/2016-08-01')
> print("Rowcount=", df.count())
> print("Unique devices=",df.select('device_id_time').distinct().count())
> #sc.stop()
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message