airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elser Rosa Leiva <>
Subject Re: [Discuss] Airflow sensor optimization
Date Thu, 07 Mar 2019 18:35:16 GMT
On 2019/03/06 14:31:57, Yingbo Wang <> wrote:
> hi,>
> I would like to open an AIP for Airflow sensor optimization.>
> *Motivation*:>
> Low efficiency in Airflow Sensor Implementation>
> Sensors are a special kind of operator that will keep running until a>
> certain criterion is met. Examples include a specific file landing in
> or S3, a partition appearing in Hive, or a specific time of the day.>
> Sensors are derived from BaseSensorOperator and run a poke method at a>
> specified poke_interval until it returns True.>
> The reason that the sensor tasks are inefficient is because in current>
> design, we sprawn a separate worker process for each partition sensor.
> worker might last a long time, until the target partition is available.
> the case where there are many sensor tasks that need to run within
> time limits, we have to allocate a lot of resources to have enough
> for the sensor tasks.>
> *Idea:*>
> We propose two approaches that could address this issues, batch-sensor>
> and smart-sensor.>
> Batch-sensor>
> The basic idea of batch-sensor is to batch process sensor tasks to save>
> resources. During running, a batch-sensor will take N partition sensor>
> requests as the input and poke those N partitions periodically. If the>
> batch-sensor finds that the criteria of some sensor task is met, the>
> batch-sensor will update the database about this sensor tasks.>
> To do this, we need to create a sensor basic class called ‘batchable’
> make all sensors inherit from this basic class. We also need to change
> behavior of schedule regarding a batchable sensor tasks. The schedule
> find as many as possible batchable sensor tasks and run those tasks in a>
> batch.>
> Smart-sensor>
> Smart-sensor is an improvement on top of batch-sensor.>
> The idea of smart-sensor is that the worker process of smart-sensor will>
> run like a service. To do this, we need to persist Sensor details in>
> Airflow DB and the worker process periodically queries task-instance
> to find sensor tasks; poke the metastore and update the task instance
> if it detects that certain partition or file created.>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message