airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "EKC (Erik Cederstrand)" <...@novozymes.com>
Subject Re: Celery or Dask?
Date Tue, 14 Feb 2017 11:39:29 GMT
Thanks to both for correcting my understanding. I'll see what information I can collect on
our issues and report back if I get anything coherent.


Kind regards,

Erik

________________________________
From: Jeremiah Lowin <jlowin@apache.org>
Sent: Monday, February 13, 2017 6:26:15 PM
To: dev@airflow.incubator.apache.org
Subject: Re: Celery or Dask?

As far as I know I'm the only person using Dask with Airflow at the moment.
I've been using Dask for a variety of other (non-Airflow) tasks and have
found it to be a great tool. However, it's important to note that Celery is
a much more mature project with finer control over how tasks are executed.
In fact Dask's objectives are totally different (I think of it as
"pure-Python Spark") but it happens to expose similar functionality to
Celery through its Distributed subproject.

I added a DaskExecutor to Airflow in my last commit and am working on
improving the unit tests now. I've been running the DaskExecutor in a test
environment with good results, but between the fact that you have to run
Airflow's bleeding-edge master branch to get it and that I'm the only
person kicking its tires (at the moment), I would only recommend using it
if you like to live very dangerously indeed.

In the near future, I can see Dask being a recommended way to scale Airflow
beyond a single machine due to the ease of setting it up -- but not yet.

On Mon, Feb 13, 2017 at 11:04 AM Bolke de Bruin <bdbruin@gmail.com> wrote:

Dask just landed in master. So no Celery is the most used option to
scale-out.

Always interested in what you are running into, but please be prepared to
provide a lot of info on your setup.

- Boke

> On 13 Feb 2017, at 17:01, EKC (Erik Cederstrand) <EKC@novozymes.com>
wrote:
>
> Hello all,
>
>
> I'm investigating why some of our DAGs are not being scheduled properly (
ran into https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-342&data=01%7C01%7CEKC%40novozymes.com%7Cba906a466ee24463ab0908d4543580ac%7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=TYksYDtZ2QEG4ZV0oMi345yvQPBIPm449X0QaaKfct0%3D&reserved=0,
among other
things). Coupled with comments on this list, I'm getting the impression
that Celery is a second-class citizen and core developers are mainly using
Dask. Is this correct?
>
>
> If Dask support is simply more mature and more likely to have issues
responded to, I'll consider migrating our installation.
>
>
> Thanks,
>
> Erik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message