Hello all
   I have a spark job that reads parquet data and partition it based on one of the columns. I made sure partitions equally distributed and not skewed. My code looks like this -

datasetA.write.partitonBy("column1").parquet(outputPath)

Execution plan -
Inline image 1

All tasks(~12,000) finishes in 30-35 mins but it takes another 40-45 mins to close application. I am not sure what spark is doing after all tasks are processes successfully. 
I checked thread dump (using UI executor tab) on few executors but couldnt find anything major. Overall, few shuffle-client processes are "RUNNABLE" and few dispatched-* processes are "WAITING". 

Please let me know what spark is doing at this stage(after all tasks finished) and any way I can optimize it.

Thanks
Swapnil