airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Greene <br...@heisenbergwoodworking.com>
Subject Re: Possible bug: Airflow frequently fail with AWS RDS backend when #tasks increases
Date Sun, 16 Aug 2020 00:21:42 GMT
When i had a similar issue it turned out that the way the task(s) were
written, they'd RAPIDLY open a large number of new RDS connections.

AWS RDS - particularly if you're using the cluster endpoint, is
performing a 'dns' lookup (4 hops if i recall correctly) before your
connection request actually resolves to a real host.  This lookup is
throttled, and after a certain number of hits in a short time, it will
return the error above (which is annoying, as it makes it look like the DB
just 'vanishes' from time time).

Brian

On Sat, Aug 15, 2020 at 7:04 PM Ricky Shi <xiao.x.shi@gmail.com> wrote:

> Hi Everyone,
>
> we encountered a very strange issue with airflow using AWS RDS as backend.
> We found that when the number of tasks is big enough (>60), airflow will
> fail with the error message (MySQL RDS backend)
>
> sqlalchemy.exc.OperationalError:
> (MySQLdb._exceptions.OperationalError) (2005, "Unknown MySQL server
> host ... $AWS RDS address)
>
> or (Postgres RDS backend):
>
> psycopg2.OperationalError: could not translate host name $AWS RDS address
>
>
> When we restart airflow, it becomes fine; and the job scheduler & website
> are both running fine. However, it will fail again after a couple of days
> of smooth running, with the same error message.
>
> We found that on stack overflow, there are other ppl experiencing the same
> issue but no solution found. Anyone knows how to resolve the issue?
>
> Thanks,
>
> --
> Ricky Shi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message