airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <>
Subject Re: Simplifying the KubernetesExecutor
Date Wed, 12 Aug 2020 14:48:05 GMT
Big +1. All the arguments are very appealing to me and simplifying the
Kubernetes Executor down to YAML-configurable one seems like a no-brainer
especially if we provide some migration tools. I've lost countless hours on
debugging some configuration problems, simply because the relevant
Kubernetes-related configuration was in the least expected place - i.e.
airflow.cfg. YAML configuration.

I am also a big fan of both 1. and 2.  I've implemented a POC of
queue-based multi-scheduler once but having it embedded as part of core
Airflow rather than based it on queues (which are basically a Celery
Executor concept) is I think much better approach. Both 1. and 2. are cool.

Now - question about timing. If we decide to go that route - my view is
that simplifying Kubernetes should be an Airflow 2.0 task - alongside more
comprehensive tests (which will be much easier to write in this case).  The
new features/ideas 1. 2. for KE I think should come after that - when we
release and stabilize 2.0. Sounds like great candidates for 2.1 to me.


On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <>

> Hello, fellow Airflowers! I hope you are all well in these trying times.
> With the recent launch of Airflow 2.0 preparation, it now seems like a
> good time to review the project's state and where we can fit in some
> breaking changes that will improve the project for the future.
> When we first created the KubernetesExecutor, we had two goals in mind.
> The first goal was to improve the airflow Auto scaling story. Previously,
> airflow users would have to manually provision celery workers, which could
> lead to wasted resources or missed SLAs. The other goal was to introduce a
> community that was not yet well versed in the Kubernetes API to the
> Kubernetes system.
> To ease the community's transition, we abstracted many of the complexities
> of creating a Kubernetes object. We chose to offer a limited number of
> configurations and keep much of the pod creation process internal to
> airflow. In the short-term, this system lowered the barrier to entry. Over
> time, however, this abstraction has become a nightmare of tech debt as the
> Kubernetes API is expensive and constantly changing.
> With this in mind, I think it's time for us to consider a more
> straightforward approach that takes the complexity out of Airflow and
> offers the full Kubernetes API to the airflow user.
> What I'm proposing here is pretty straightforward. We remove all
> Kubernetes pod creation configurations from the airflow.cfg and instead
> offer only one way to use the KubernetesExecutor: with a YAML file.
> We can easily supply all of the configurations to the KubernetesExecutor
> by offering example YAMLs (git sync mode is just a sidecar and an init
> container, DAG volumes are just an example volume and volume mount, etc.).
> This system would simplify a user's ability to predict what a pod will
> look like once it is launched by airflow. They will know it's a base pod
> and will be able to simply modify the pod object using the executor config
> and the pod mutation hook.
> This simplification could also lead to some pretty great new features in
> the KubernetesExecutor
> Idea 1: Picking a pod_template_file per-task
> Along with the existing customization with the executor config, solely
> relying on pod files can allow users to pick the pod template file that
> they want to use as their base pod on a per-task basis. An Airflow engineer
> could supply several pre-made templates for their data scientists to reduce
> the amount of customization an airflow user would need to use.
> Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> One idea that we've been excited about recently has been the idea of
> creating merged Celery and Kubernetes executor. This hybrid executor would
> default to launching celery workers with KEDA and would have the option to
> launch individual tasks using the Kubernetes executor when a user wants
> isolation or customization. Simplifying the Kubernetes executor reduces the
> number of fail-points that this merged executor would need to account for.
> What would we need to do to implement this?
> The good news here is that the hard work has already been done! As of
> AIRFLOW-5413 [] by
> David Lum, airflow already has the ability to use base worker pods on a
> template file. This would involve primarily code deletion and very little
> new code.
> Thank you for your time and I look forward to the community’s discussion.
> Daniel


Jarek Potiuk
Polidea <> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message