airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil Breguła <kamil.breg...@polidea.com>
Subject Re: API Reference - current confusion and improvement plan
Date Wed, 27 Mar 2019 04:50:50 GMT
Hi.

Work on this has been completed.
New documentation is available:
https://airflow.readthedocs.io/en/latest/_api/index.html

Greetings
Kamil Breguła

On Wed, Feb 27, 2019 at 12:51 PM Kamil Breguła
<kamil.bregula@polidea.com> wrote:
>
> Hi.
>
> Me and Jarek Potiuk have recently worked to finish these changes. As a result, a PR series
was created:
>
> - [AIRFLOW-XXX][1/3] Syntax docs improvements - https://github.com/apache/airflow/pull/4789
> - [AIRFLOW-3968][2/3] Refactor base GCP hook -  https://github.com/apache/airflow/pull/4790
> - [AIRFLOW-3811][3/3] Add automatic generation of API Reference  - https://github.com/apache/airflow/pull/4788
>
> I invite you to review. Preview is available in the description of each PR
>
> Greets,
> Kamil Breguła
>
> On Wed, Feb 6, 2019 at 2:09 PM Szymon Przedwojski <szymon.przedwojski@polidea.com>
wrote:
>>
>> +1
>> I also like the new docs layout and the big win is that it’s generated automatically
from all files and we won’t have to modify code.rst / integration.rst manually anymore.
>>
>> Szymon Przedwojski
>> Polidea | Software Engineer
>>
>> M: +48 500 330 790
>> E: szymon.przedwojski@polidea.com
>>
>> > On 5 Feb 2019, at 21:33, Ash Berlin-Taylor <ash@apache.org> wrote:
>> >
>> > I have idly wondered about something like this as a layout
>> >
>> >    from airflow.$something.aws.operators import EmrAddStepOperator
>> >
>> > - Grouping by service provider is more helpful
>> > - Having more than one operator per module
>> > - Not having `_operator` (etc.) suffix on the modue, and the class, and the
module path
>> >
>> > Perhaps a bigger change - though to make it much less painful on our users we
could keep the old names with a deprecation warning or two (even past 2.0, to say 2.1) Out
of scope for current discussion.
>> >
>> > -ash
>> >
>> >> On 5 Feb 2019, at 20:22, Kamil Breguła <kamil.bregula@polidea.com>
wrote:
>> >>
>> >> I think that we should group operators by service (ex. Amazon Web Service:
>> >> Simple Cloud Storage). One module to one service. it will be much easier
to
>> >> navigate through them. A similar problem occurs with the Google Cloud
>> >> Storage service, but we have a solution (PR:
>> >> https://github.com/apache/airflow/pull/3000 ). A large part and future
>> >> operators, which are written in accordance with the recommendations (
>> >> https://lists.apache.org/thread.html/e8534d82be611ae7bcb21ba371546a4278aad117d5e50361fd8f14fe@%3Cdev.airflow.apache.org%3E),
>> >> follow these rules.
>> >>
>> >> The problem will be with operators that integrate two services at the same
>> >> time. I think that we can leave them in a separate module and link to this
>> >> class in the description of the module.
>> >>
>> >> However, this is not a current problem. I just wanted to mark future
>> >> improvements, which is possible if we introduce the proposed solution.
>> >>
>> >> On Tue, Feb 5, 2019 at 8:57 PM Ash Berlin-Taylor <ash@apache.org>
wrote:
>> >>
>> >>> I like the API reference v2 layout a lot! Much easier to navigate and
see
>> >>> what classes are available, for me at least
>> >>>
>> >>> Documenting modules will help somewhat with a few things but, lets say
the
>> >>> "AWS" section of the integration doc is across the following modules:
>> >>>
>> >>> airflow.contrib.operators.aws_athena_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/aws_athena_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.awsbatch_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/awsbatch_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.ecs_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/ecs_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.emr_add_steps_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_add_steps_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.emr_create_job_flow_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_create_job_flow_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.emr_terminate_job_flow_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_terminate_job_flow_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.s3_copy_object_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_copy_object_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.s3_delete_objects_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_delete_objects_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.s3_list_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_list_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.s3_to_gcs_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.s3_to_gcs_transfer_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_transfer_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.s3_to_sftp_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_sftp_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.sagemaker_base_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_base_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.sagemaker_endpoint_config_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_config_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.sagemaker_endpoint_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.sagemaker_model_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_model_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.sagemaker_training_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_training_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.sagemaker_transform_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_transform_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.sagemaker_tuning_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_tuning_operator/index.html
>> >>>>
>> >>> airflow.contrib.operators.segment_track_event_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/segment_track_event_operator/index.html
>> >>>>
>> >>> airflow.operators.redshift_to_s3_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/operators/redshift_to_s3_operator/index.html
>> >>>>
>> >>> airflow.operators.s3_file_transform_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_file_transform_operator/index.html
>> >>>>
>> >>> airflow.operators.s3_to_hive_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_hive_operator/index.html
>> >>>>
>> >>> airflow.operators.s3_to_redshift_operator <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_redshift_operator/index.html
>> >>>>
>> >>> airflow.sensors.s3_key_sensor <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_key_sensor/index.html
>> >>>>
>> >>> airflow.sensors.s3_prefix_sensor <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_prefix_sensor/index.html
>> >>>>
>> >>> airflow.contrib.sensors.emr_base_sensor <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_base_sensor/index.html
>> >>>>
>> >>> airflow.contrib.sensors.emr_job_flow_sensor <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_job_flow_sensor/index.html
>> >>>>
>> >>> airflow.contrib.sensors.emr_step_sensor <
>> >>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_step_sensor/index.html
>> >>>>
>> >>>
>> >>> And that was just before I got bored of looking for them :)
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>>
>> >>>> On 5 Feb 2019, at 16:25, Kamil Breguła <kamil.bregula@polidea.com>
>> >>> wrote:
>> >>>>
>> >>>> I already have a POC: :-)
>> >>>>
>> >>>> Available at: http://level-can.surge.sh/html/autoapi/index.html
>> >>>>
>> >>>> I would like to point out that in addition to class documentation,
you
>> >>> can
>> >>>> also document modules.
>> >>>>
>> >>> http://level-can.surge.sh/html/autoapi/airflow/executors/local_executor/index.html
>> >>>> Currently, the `howto/operators.rst` file is used for this (Related
link:
>> >>>>
>> >>> https://airflow.readthedocs.io/en/latest/howto/operator.html#cloudsqlqueryoperator
>> >>>> )
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 5, 2019 at 5:18 PM Ash Berlin-Taylor <ash@apache.org>
wrote:
>> >>>>
>> >>>>>> We want to rewrite the `integration.rst` file so that it
does not
>> >>> contain
>> >>>>>> duplicates from `code.rst ' (API Reference). In the next
step,
>> >>> introduce
>> >>>>>> the reference API generation based on the source code that
will replace
>> >>>>> the
>> >>>>>> `code.rst` file.
>> >>>>>
>> >>>>> :100: Yes please!
>> >>>>>
>> >>>>>
>> >>>>> Given a number of integrations are across multiple files (n
operators,
>> >>> and
>> >>>>> m hooks) my first thought is that something in integration.rst,
or a
>> >>> file
>> >>>>> elsewhere in the docs/ tree is the place to put this.
>> >>>>>
>> >>>>> On epydoc vs a sphinx extension I lean very heavily towards
the sphinx
>> >>>>> extension, as we are already using much of sphinx.
>> >>>>>
>> >>>>> Can you create a _small_ example of what you'd propse for no.4
(I don't
>> >>>>> want you to do a lot of work that might be wasted)
>> >>>>>
>> >>>>> -ash
>> >>>>>
>> >>>>>
>> >>>>>> On 5 Feb 2019, at 15:55, Kamil Breguła <kamil.bregula@polidea.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hello community,
>> >>>>>>
>> >>>>>> While working on the documentation for the GCP operators,
my team at
>> >>>>>> Polidea encountered some confusion related to the structure
of the
>> >>>>>> documentation.
>> >>>>>>
>> >>>>>> Short story:
>> >>>>>>
>> >>>>>> We want to rewrite the `integration.rst` file so that it
does not
>> >>> contain
>> >>>>>> duplicates from `code.rst ' (API Reference). In the next
step,
>> >>> introduce
>> >>>>>> the reference API generation based on the source code that
will replace
>> >>>>> the
>> >>>>>> `code.rst` file.
>> >>>>>>
>> >>>>>> Long story:
>> >>>>>>
>> >>>>>> Currently, the documentation contains two places where the
description
>> >>> of
>> >>>>>> classes related to operators is included. They are `code.rst`
and
>> >>>>>> `integration.rst` files.
>> >>>>>>
>> >>>>>> The `integration.rst` file contains information about integration,
in
>> >>>>>> particular for Azure: Microsoft Azure, AWS: Amazon Web Services,
>> >>>>>> Databricks, GCP: Google Cloud Platform, Qubole. Other integrations,
>> >>>>>> however, do not have descriptions.
>> >>>>>>
>> >>>>>> The `code.rst` file contains “API Reference” which contains
information
>> >>>>>> about *all* classes including those included in the file
>> >>>>> `integration.rst`.
>> >>>>>>
>> >>>>>> Such duplication, however, is problematic for several reasons:
>> >>>>>>
>> >>>>>> 1.
>> >>>>>>
>> >>>>>> Users may feel lost and may not know which section they
should look
>> >>>>> into.
>> >>>>>> 2.
>> >>>>>>
>> >>>>>> Changes must be made in many places which leads to desynchronization.
>> >>>>>> Most often, changes are made only in the source code, so
they do not
>> >>>>> appear
>> >>>>>> in the generated documentation.
>> >>>>>> 3.
>> >>>>>>
>> >>>>>> Linking to classes using the `class` directive for Sphinx
is
>> >>>>>> inconclusive - if the code is embedded both in `integration.rst`
and
>> >>>>>> `code.rst` using the `autoclass` directive, we’re not
sure where the
>> >>>>> user
>> >>>>>> will be navigated.
>> >>>>>>
>> >>>>>>
>> >>>>>> There are several solutions::
>> >>>>>>
>> >>>>>> 1.
>> >>>>>>
>> >>>>>> Leave it as is. Then we need to agree on which `autoclass`
directive
>> >>>>>> should have the `no-index` flags.
>> >>>>>> 2.
>> >>>>>>
>> >>>>>> Delete duplicates from the `code.rst` file and add a note
about the
>> >>>>>> `integration.rst` file in the `code.rst` file.
>> >>>>>> 3.
>> >>>>>>
>> >>>>>> Delete duplicates from the `integration.rst` file and add
a note about
>> >>>>>> the `code.rst` file in the `integration.rst` file.
>> >>>>>> 4.
>> >>>>>>
>> >>>>>> Delete information from both files and generate the API
documentation
>> >>>>>> always based only on the source code. This solution means
that we
>> >>> would
>> >>>>>> have to write less documentation.
>> >>>>>> There are ready tools that we can use:
>> >>>>>> 1.
>> >>>>>>
>> >>>>>>   epydoc - http://epydoc.sourceforge.net/ ;
>> >>>>>>   2.
>> >>>>>>
>> >>>>>>   autoapi extension for Sphinx -
>> >>>>> https://github.com/rtfd/sphinx-autoapi
>> >>>>>>   ;
>> >>>>>>   3.
>> >>>>>>
>> >>>>>>   other - https://wiki.python.org/moin/DocumentationTools
>> >>>>>>
>> >>>>>>
>> >>>>>> The first, second, third solution does not solve all problems.
In
>> >>>>>> particular, we still need to complete the `code.rst` and
>> >>>>> `integration.rst`
>> >>>>>> files. The fourth solution solves all problems, but is the
most
>> >>> complex.
>> >>>>> It
>> >>>>>> is worth noting that mixing solutions is possible. For example,
we can
>> >>>>>> delete information from the file `integration.rst` as short
term
>> >>> solution
>> >>>>>> and start working on creating another form of documentation
as a long
>> >>>>> term
>> >>>>>> solution. This is the best option in our opinion.
>> >>>>>>
>> >>>>>> I’ve recently done a few activities related to this topic.
>> >>>>>>
>> >>>>>> First, I added the noindex flag to autoclass directives
for all
>> >>> operators
>> >>>>>> in `integration.rst` file. In rare cases (If any), this
caused links
>> >>> that
>> >>>>>> were previously directed to the file `integration.rst` to
be redirected
>> >>>>> to
>> >>>>>> the `code.rst` file. Elements had to be linked using `:class:`
instead
>> >>>>> of a
>> >>>>>> section link. Each operator is included in the new section
in this
>> >>> file.
>> >>>>>>
>> >>>>>> PR: https://github.com/apache/airflow/pull/4585
>> >>>>>> <https://github.com/apache/airflow/pull/4585/files>
>> >>>>>>
>> >>>>>> Second, I completed the `code.rst` file with the missing
classes.
>> >>>>>>
>> >>>>>> PR: https://github.com/apache/airflow/pull/4644
>> >>>>>>
>> >>>>>> I would like to ask which solution is the best in your opinion?
What
>> >>>>> steps
>> >>>>>> should we take to make the documentation more enjoyable?
>> >>>>>>
>> >>>>>> Greetings
>> >>>>>>
>> >>>>>> Kamil Breguła
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Kamil Breguła
>> >>>> Polidea <https://www.polidea.com/> | Software Engineer
>> >>>>
>> >>>> M: +48 505 458 451 <+48505458451>
>> >>>> E: kamil.bregula@polidea.com
>> >>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>
>> >>>> We create human & business stories through technology.
>> >>>> Check out our projects! <https://www.polidea.com/our-work>
>> >>>> [image: Github] <https://github.com/Polidea> [image: Facebook]
>> >>>> <https://www.facebook.com/Polidea.Software> [image: Twitter]
>> >>>> <https://twitter.com/polidea> [image: Linkedin]
>> >>>> <https://www.linkedin.com/company/polidea> [image: Instagram]
>> >>>> <https://instagram.com/polidea> [image: Behance]
>> >>>> <https://www.behance.net/polidea>
>> >>>
>> >>>
>> >>
>> >> --
>> >>
>> >> Kamil Breguła
>> >> Polidea <https://www.polidea.com/> | Software Engineer
>> >>
>> >> M: +48 505 458 451 <+48505458451>
>> >> E: kamil.bregula@polidea.com
>> >> [image: Polidea] <https://www.polidea.com/>
>> >>
>> >> We create human & business stories through technology.
>> >> Check out our projects! <https://www.polidea.com/our-work>
>> >> [image: Github] <https://github.com/Polidea> [image: Facebook]
>> >> <https://www.facebook.com/Polidea.Software> [image: Twitter]
>> >> <https://twitter.com/polidea> [image: Linkedin]
>> >> <https://www.linkedin.com/company/polidea> [image: Instagram]
>> >> <https://instagram.com/polidea> [image: Behance]
>> >> <https://www.behance.net/polidea>
>> >
>>
>
>
> --
>
> Kamil Breguła
> Polidea | Software Engineer
>
> M: +48 505 458 451
> E: kamil.bregula@polidea.com
>
> We create human & business stories through technology.
> Check out our projects!



-- 

Kamil Breguła
Polidea | Software Engineer

M: +48 505 458 451
E: kamil.bregula@polidea.com

We create human & business stories through technology.
Check out our projects!

Mime
View raw message