If I stop and start while processing the batch what will happen? will that batch gets canceled and gets reprocessed again when I click start? Does that mean I need to worry about duplicates in the downstream? Kafka consumers have a pause and resume and they work just fine so I am not sure why Spark doesn't expose that.

exactly my question, I was also looking for ways to gracefully exit spark structured streaming.


I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions.