spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 陆巍|Wei Lu(RD) <>
Subject many 'activity' job are pending
Date Fri, 15 Jul 2016 16:17:41 GMT
Hi there,

I meet with a “many Active jobs” issue when using direct kafka streaming on YARN. (spark
1.5, hadoop 2.6, CDH5.5.1)

The problem happens when kafka has almost NO traffic.

From application UI, I see many ‘active’ jobs are pending for hours. And finally the driver
“Requesting 4 new executors because tasks are backlogged”.

But, when looking at the driver log of a ‘activity’ job, the log says the job is finished.
So, why the application UI shows this job is activity like forever?


Here are related log info about one of the ‘activity’ jobs.
There are two stages: a reduceByKey follows a flatmap. The log says both stages are finished
in ~20ms and the job also finishes in 64 ms.

Got job 6567
Final stage: ResultStage 9851(foreachRDD at
Parents of final stage: List(ShuffleMapStage 9850)
Missing parents: List(ShuffleMapStage 9850)
Finished task 0.0 in stage 9850.0 (TID 29551) in 20 ms
Removed TaskSet 9850.0, whose tasks have all completed, from pool
ShuffleMapStage 9850 (flatMap at OpaTransLogAnalyzeWithShuffle.scala:83) finished in 0.022
Submitting ResultStage 9851 (ShuffledRDD[16419] at reduceByKey at OpaTransLogAnalyzeWithShuffle.scala:83),
which is now runnable
ResultStage 9851 (foreachRDD at OpaTransLogAnalyzeWithShuffle.scala:84) finished in 0.023
Job 6567 finished: foreachRDD at OpaTransLogAnalyzeWithShuffle.scala:84, took 0.064372 s
Finished job streaming job 1468592373000 ms.1 from job set of time 1468592373000 ms

Wei Lu
View raw message