airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomasz Urbaszek <turbas...@apache.org>
Subject Re: Generic Transfer Operator
Date Thu, 27 Aug 2020 08:49:54 GMT
I like the approach as it itnroduces another interesting operators'
interface standarization. It would be awesome to here more opinions :)

Cheers,
Tomek

On Wed, Aug 19, 2020 at 8:10 PM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> I like the idea a lot. Similar things have been discussed before but the
> proposal is I think rather pragmatic and solves a real problem (and it does
> not seem to be too complex to implement)
>
> There is some discussion about it already in the document (please chime-in
> for those interested) but here a few points why I like it:
>
> - performance and optimization is not a focus for that. For generic stuff
> it is usually to write "optimal" solution but once you admit you are not
> going to focus for optimisation, you come with simpler and easier to use
> solutions
>
> - on the other hand - it uses very "Python'y" approach with using
> Airflow's familiar concepts (connection, transfer) and has the potential of
> plugging in into 100s of hooks we have already easily - leveraging all the
> "providers" richness of Airflow.
>
> - it aims to be easy to do "quick start" - if you have a number of
> different sources/targets and as a data scientist you would like to quickly
> start transferring data between them  - you can do it easily with only
> basic python knowledge and simple DAG structure.
>
> - it should be possible to plug it in into our new functional approach as
> well as future lineage discussions as it makes connection between sources
> and targets
>
> - it opens up possibilities of adding simple and flexible data
> transformation on-transfer. Not a replacement for any of the external
> services that Airflow should use (Airflow is an orchestrator, not data
> processing solution) but for the kind of quick-start scenarios I foresee it
> might be most useful, being able to apply simple data transformation on the
> fly by data scientist might be a big plus.
>
> Suggestion: Panda DataFrame as the format of the "data" component
>
> Kamil - you should have access now.
>
> J.
>
>
> On Tue, Aug 18, 2020 at 6:53 PM Kamil Olszewski <
> kamil.olszewski@polidea.com>
> wrote:
>
> > Hello all,
> > in Polidea we have come up with an idea for a generic transfer operator
> > that would be able to transport data between two destinations of various
> > types (file, database, storage, etc.) - please find the link with a short
> > doc with POC
> > <
> >
> https://docs.google.com/document/d/1o7Ph7RRNqLWkTbe7xkWjb100eFaK1Apjv27LaqHgNkE/edit?usp=sharing
> > >
> > where we can discuss the design initially. Once we come to the initial
> > conclusion I can create an AIP on cWiki - can I ask for permission to do
> so
> > (my id is 'kamil.olszewski')? I believe that during the discussion we
> > should definitely aim for this feature to be released only after Airflow
> > 2.0 is out.
> >
> > What do you think about this idea? Would you find such an operator
> helpful
> > in your pipelines? Maybe you already use a similar solution or know
> > packages that could be used to implement it?
> >
> > Best regards,
> > --
> >
> > Kamil Olszewski
> > Polidea <https://www.polidea.com> | Software Engineer
> >
> > M: +48 503 361 783
> > E: kamil.olszewski@polidea.com
> >
> > Unique Tech
> > Check out our projects! <https://www.polidea.com/our-work>
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message