spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh Ramanathan <>
Subject Re: [Kubernetes] structured-streaming driver restarts / roadmap
Date Wed, 28 Mar 2018 14:28:10 GMT
We discussed this early on in our fork and I think we should have this in a
JIRA and discuss it further. It's something we want to address in the

One proposed method is using a StatefulSet of size 1 for the driver. This
ensures recovery but at the same time takes away from the completion
semantics of a single pod.

See history in

On Wed, Mar 28, 2018, 6:56 AM Lucas Kacher <> wrote:

> A carry-over from the apache-spark-on-k8s project, it would be useful to
> have a configurable restart policy for submitted jobs with the Kubernetes
> resource manager. See the following issues:
> Use case: I have a structured streaming job that reads from Kafka,
> aggregates, and writes back out to Kafka deployed via k8s and checkpointing
> to a remote location. If the driver pod dies for a any number of reasons,
> it will not restart.
> For us, as all data is stored via checkpoint and we are satisfied with
> at-least-once semantics, it would be useful if the driver were to come back
> on it's own and pick back up.
> Firstly, may we add this to JIRA? Secondly, Is there any insight as to
> what the thought is around allowing that to be configurable in the future?
> If it's not something that will happen natively, we will end up needing to
> write something that polls or listens to k8s and has the ability to
> re-submit any failed jobs.
> Thanks!
> --
> *Lucas Kacher*Senior Engineer
> -
> <>
> New York, NY
> 818.512.5239

View raw message