got it. seems like i better stay away from this feature for now..


On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi <mayur.rustagi@gmail.com> wrote:
One issue is that job cancellation is posted on eventloop. So its possible that subsequent jobs submitted to job queue may beat the job cancellation event & hence the job cancellation event may end up closing them too.
So there's definitely a race condition you are risking even if not running into. 



On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers <koert@tresata.com> wrote:
SparkContext.cancelJobGroup


On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi <mayur.rustagi@gmail.com> wrote:
How do you cancel the job. Which API do you use?



On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers <koert@tresata.com> wrote:
i also noticed that jobs (with a new JobGroupId) which i run after this use which use the same RDDs get very confused. i see lots of cancelled stages and retries that go on forever.


On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers <koert@tresata.com> wrote:
i have a running job that i cancel while keeping the spark context alive.

at the time of cancellation the active stage is 14.

i see in logs:
2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 10
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 14
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was cancelled
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 14.0 from pool x
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 13
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 11
2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 15

so far it all looks good. then i get a lot of messages like this:
2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update with state FINISHED from TID 883 because its task set is gone
2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update with state KILLED from TID 888 because its task set is gone

after this stage 14 hangs around in active stages, without any sign of progress or cancellation. it just sits there forever, stuck. looking at the logs of the executors confirms this. they task seem to be still running, but nothing is happening. for example (by the time i look at this its 4:58 so this tasks hasnt done anything in 15 mins):

14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is 1007
14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to driver
14/03/04 16:43:16 INFO Executor: Finished task ID 943
14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is 1007
14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to driver
14/03/04 16:43:16 INFO Executor: Finished task ID 945
14/03/04 16:43:19 INFO BlockManager: Removing RDD 66

not sure what to make of this. any suggestions? best, koert