spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: trying to understand job cancellation
Date Wed, 05 Mar 2014 22:32:35 GMT
How do you cancel the job. Which API do you use?

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers <koert@tresata.com> wrote:

> i also noticed that jobs (with a new JobGroupId) which i run after this
> use which use the same RDDs get very confused. i see lots of cancelled
> stages and retries that go on forever.
>
>
> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>> i have a running job that i cancel while keeping the spark context alive.
>>
>> at the time of cancellation the active stage is 14.
>>
>> i see in logs:
>> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job
>> group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 10
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 14
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
>> cancelled
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 14.0
>> from pool x
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 13
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 11
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 15
>>
>> so far it all looks good. then i get a lot of messages like this:
>> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update
>> with state FINISHED from TID 883 because its task set is gone
>> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update
>> with state KILLED from TID 888 because its task set is gone
>>
>> after this stage 14 hangs around in active stages, without any sign of
>> progress or cancellation. it just sits there forever, stuck. looking at the
>> logs of the executors confirms this. they task seem to be still running,
>> but nothing is happening. for example (by the time i look at this its 4:58
>> so this tasks hasnt done anything in 15 mins):
>>
>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is 1007
>> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to driver
>> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is 1007
>> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to driver
>> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
>> 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66
>>
>> not sure what to make of this. any suggestions? best, koert
>>
>
>

Mime
View raw message