airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Imberman <daniel.imber...@gmail.com>
Subject Re: Optimize KuberneteExecutor pod labels to task instance key
Date Mon, 24 Aug 2020 16:31:04 GMT
Hi Ping,

I think that’s a great idea! Would be glad to help merge this.

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2]
On Sun, Aug 23, 2020 at 11:33 PM, Ping Zhang <pingzh@umich.edu> wrote:
Hi everyone,

I was evaluating using *KubernetesExcutor* and found the inefficiency of `
*_labels_to_key*`, see code
<https://github.com/apache/airflow/blob/master/airflow/executors/kubernetes_executor.py#L608-L674>,
which potentially does a very expensive db query for a large airflow
cluster when the dag_id or task_id have different char sets of kubernetes
labels
<https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set>
.

I am proposing using Pod Annotation to record the task instance key
information given that the value of annotation does not have restriction.
In the event streaming from k8s, the annotation can be retrieved via `
*task.metadata.annotations*` with code example
<https://gist.github.com/pingzh/f3488116304b81d73d1bed3c53a5c85f#file-stream_pod-py>
.

Please let me know your thoughts before I start to upstream my changes.

Best wishes

Ping Zhang
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message