airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Greene <>
Subject Re: Possible bug: Airflow frequently fail with AWS RDS backend when #tasks increases
Date Sun, 16 Aug 2020 00:21:42 GMT
When i had a similar issue it turned out that the way the task(s) were
written, they'd RAPIDLY open a large number of new RDS connections.

AWS RDS - particularly if you're using the cluster endpoint, is
performing a 'dns' lookup (4 hops if i recall correctly) before your
connection request actually resolves to a real host.  This lookup is
throttled, and after a certain number of hits in a short time, it will
return the error above (which is annoying, as it makes it look like the DB
just 'vanishes' from time time).


On Sat, Aug 15, 2020 at 7:04 PM Ricky Shi <> wrote:

> Hi Everyone,
> we encountered a very strange issue with airflow using AWS RDS as backend.
> We found that when the number of tasks is big enough (>60), airflow will
> fail with the error message (MySQL RDS backend)
> sqlalchemy.exc.OperationalError:
> (MySQLdb._exceptions.OperationalError) (2005, "Unknown MySQL server
> host ... $AWS RDS address)
> or (Postgres RDS backend):
> psycopg2.OperationalError: could not translate host name $AWS RDS address
> When we restart airflow, it becomes fine; and the job scheduler & website
> are both running fine. However, it will fail again after a couple of days
> of smooth running, with the same error message.
> We found that on stack overflow, there are other ppl experiencing the same
> issue but no solution found. Anyone knows how to resolve the issue?
> Thanks,
> --
> Ricky Shi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message