airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: API Reference - current confusion and improvement plan
Date Fri, 29 Mar 2019 15:16:48 GMT
It  took pulling in about another 30 commits to get it without conflicts but I've pulled this
in to the v1-10-stable branch so it will be in the 1.10.3 too!

(There are 68 commits to the branch already since 1.10.3b1. Time for a beta2 I think)

-a

> On 29 Mar 2019, at 13:28, Jiajie Zhong <zhongjiajie955@hotmail.com> wrote:
> 
> Thanks Kamil, really a great change in out documentation
> 
> 
> Best wish.
> -- Jiajie
> 
> ________________________________
> From: Driesprong, Fokko <fokko@driesprong.frl>
> Sent: Friday, March 29, 2019 19:16
> To: dev@airflow.apache.org
> Subject: Re: API Reference - current confusion and improvement plan
> 
> Awesome work Kamil. Thanks for giving some love to the documentation. It
> really needed some :-)
> 
> Don't forget to remove the line from the Github template: When adding new
> operators/hooks/sensors, the autoclass documentation generation needs to be
> added.
> https://github.com/apache/airflow/blob/master/.github/PULL_REQUEST_TEMPLATE.md
> 
> Cheers, Fokko
> 
> Op wo 27 mrt. 2019 om 05:59 schreef Kamil Breguła <kamil.bregula@polidea.com
>> :
> 
>> Hi.
>> 
>> Work on this has been completed.
>> New documentation is available:
>> https://airflow.readthedocs.io/en/latest/_api/index.html
>> 
>> Greetings
>> Kamil Breguła
>> 
>> On Wed, Feb 27, 2019 at 12:51 PM Kamil Breguła
>> <kamil.bregula@polidea.com> wrote:
>>> 
>>> Hi.
>>> 
>>> Me and Jarek Potiuk have recently worked to finish these changes. As a
>> result, a PR series was created:
>>> 
>>> - [AIRFLOW-XXX][1/3] Syntax docs improvements -
>> https://github.com/apache/airflow/pull/4789
>>> - [AIRFLOW-3968][2/3] Refactor base GCP hook -
>> https://github.com/apache/airflow/pull/4790
>>> - [AIRFLOW-3811][3/3] Add automatic generation of API Reference  -
>> https://github.com/apache/airflow/pull/4788
>>> 
>>> I invite you to review. Preview is available in the description of each
>> PR
>>> 
>>> Greets,
>>> Kamil Breguła
>>> 
>>> On Wed, Feb 6, 2019 at 2:09 PM Szymon Przedwojski <
>> szymon.przedwojski@polidea.com> wrote:
>>>> 
>>>> +1
>>>> I also like the new docs layout and the big win is that it’s generated
>> automatically from all files and we won’t have to modify code.rst /
>> integration.rst manually anymore.
>>>> 
>>>> Szymon Przedwojski
>>>> Polidea | Software Engineer
>>>> 
>>>> M: +48 500 330 790
>>>> E: szymon.przedwojski@polidea.com
>>>> 
>>>>> On 5 Feb 2019, at 21:33, Ash Berlin-Taylor <ash@apache.org> wrote:
>>>>> 
>>>>> I have idly wondered about something like this as a layout
>>>>> 
>>>>>   from airflow.$something.aws.operators import EmrAddStepOperator
>>>>> 
>>>>> - Grouping by service provider is more helpful
>>>>> - Having more than one operator per module
>>>>> - Not having `_operator` (etc.) suffix on the modue, and the class,
>> and the module path
>>>>> 
>>>>> Perhaps a bigger change - though to make it much less painful on our
>> users we could keep the old names with a deprecation warning or two (even
>> past 2.0, to say 2.1) Out of scope for current discussion.
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 5 Feb 2019, at 20:22, Kamil Breguła <kamil.bregula@polidea.com>
>> wrote:
>>>>>> 
>>>>>> I think that we should group operators by service (ex. Amazon Web
>> Service:
>>>>>> Simple Cloud Storage). One module to one service. it will be much
>> easier to
>>>>>> navigate through them. A similar problem occurs with the Google Cloud
>>>>>> Storage service, but we have a solution (PR:
>>>>>> https://github.com/apache/airflow/pull/3000 ). A large part and
>> future
>>>>>> operators, which are written in accordance with the recommendations
(
>>>>>> 
>> https://lists.apache.org/thread.html/e8534d82be611ae7bcb21ba371546a4278aad117d5e50361fd8f14fe@%3Cdev.airflow.apache.org%3E
>> ),
>>>>>> follow these rules.
>>>>>> 
>>>>>> The problem will be with operators that integrate two services at
>> the same
>>>>>> time. I think that we can leave them in a separate module and link
>> to this
>>>>>> class in the description of the module.
>>>>>> 
>>>>>> However, this is not a current problem. I just wanted to mark future
>>>>>> improvements, which is possible if we introduce the proposed
>> solution.
>>>>>> 
>>>>>> On Tue, Feb 5, 2019 at 8:57 PM Ash Berlin-Taylor <ash@apache.org>
>> wrote:
>>>>>> 
>>>>>>> I like the API reference v2 layout a lot! Much easier to navigate
>> and see
>>>>>>> what classes are available, for me at least
>>>>>>> 
>>>>>>> Documenting modules will help somewhat with a few things but,
lets
>> say the
>>>>>>> "AWS" section of the integration doc is across the following
>> modules:
>>>>>>> 
>>>>>>> airflow.contrib.operators.aws_athena_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/aws_athena_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.awsbatch_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/awsbatch_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.ecs_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/ecs_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.emr_add_steps_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_add_steps_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.emr_create_job_flow_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_create_job_flow_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.emr_terminate_job_flow_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/emr_terminate_job_flow_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.s3_copy_object_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_copy_object_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.s3_delete_objects_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_delete_objects_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.s3_list_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_list_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.s3_to_gcs_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.s3_to_gcs_transfer_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_gcs_transfer_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.s3_to_sftp_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/s3_to_sftp_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.sagemaker_base_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_base_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.sagemaker_endpoint_config_operator
<
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_config_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.sagemaker_endpoint_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_endpoint_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.sagemaker_model_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_model_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.sagemaker_training_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_training_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.sagemaker_transform_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_transform_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.sagemaker_tuning_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/sagemaker_tuning_operator/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.operators.segment_track_event_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/operators/segment_track_event_operator/index.html
>>>>>>>> 
>>>>>>> airflow.operators.redshift_to_s3_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/operators/redshift_to_s3_operator/index.html
>>>>>>>> 
>>>>>>> airflow.operators.s3_file_transform_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_file_transform_operator/index.html
>>>>>>>> 
>>>>>>> airflow.operators.s3_to_hive_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_hive_operator/index.html
>>>>>>>> 
>>>>>>> airflow.operators.s3_to_redshift_operator <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/operators/s3_to_redshift_operator/index.html
>>>>>>>> 
>>>>>>> airflow.sensors.s3_key_sensor <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_key_sensor/index.html
>>>>>>>> 
>>>>>>> airflow.sensors.s3_prefix_sensor <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/sensors/s3_prefix_sensor/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.sensors.emr_base_sensor <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_base_sensor/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.sensors.emr_job_flow_sensor <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_job_flow_sensor/index.html
>>>>>>>> 
>>>>>>> airflow.contrib.sensors.emr_step_sensor <
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/contrib/sensors/emr_step_sensor/index.html
>>>>>>>> 
>>>>>>> 
>>>>>>> And that was just before I got bored of looking for them :)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> On 5 Feb 2019, at 16:25, Kamil Breguła <kamil.bregula@polidea.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I already have a POC: :-)
>>>>>>>> 
>>>>>>>> Available at: http://level-can.surge.sh/html/autoapi/index.html
>>>>>>>> 
>>>>>>>> I would like to point out that in addition to class documentation,
>> you
>>>>>>> can
>>>>>>>> also document modules.
>>>>>>>> 
>>>>>>> 
>> http://level-can.surge.sh/html/autoapi/airflow/executors/local_executor/index.html
>>>>>>>> Currently, the `howto/operators.rst` file is used for this
>> (Related link:
>>>>>>>> 
>>>>>>> 
>> https://airflow.readthedocs.io/en/latest/howto/operator.html#cloudsqlqueryoperator
>>>>>>>> )
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Feb 5, 2019 at 5:18 PM Ash Berlin-Taylor <ash@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>>>> We want to rewrite the `integration.rst` file so
that it does not
>>>>>>> contain
>>>>>>>>>> duplicates from `code.rst ' (API Reference). In the
next step,
>>>>>>> introduce
>>>>>>>>>> the reference API generation based on the source
code that will
>> replace
>>>>>>>>> the
>>>>>>>>>> `code.rst` file.
>>>>>>>>> 
>>>>>>>>> :100: Yes please!
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Given a number of integrations are across multiple files
(n
>> operators,
>>>>>>> and
>>>>>>>>> m hooks) my first thought is that something in integration.rst,
>> or a
>>>>>>> file
>>>>>>>>> elsewhere in the docs/ tree is the place to put this.
>>>>>>>>> 
>>>>>>>>> On epydoc vs a sphinx extension I lean very heavily towards
the
>> sphinx
>>>>>>>>> extension, as we are already using much of sphinx.
>>>>>>>>> 
>>>>>>>>> Can you create a _small_ example of what you'd propse
for no.4 (I
>> don't
>>>>>>>>> want you to do a lot of work that might be wasted)
>>>>>>>>> 
>>>>>>>>> -ash
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 5 Feb 2019, at 15:55, Kamil Breguła <
>> kamil.bregula@polidea.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hello community,
>>>>>>>>>> 
>>>>>>>>>> While working on the documentation for the GCP operators,
my
>> team at
>>>>>>>>>> Polidea encountered some confusion related to the
structure of
>> the
>>>>>>>>>> documentation.
>>>>>>>>>> 
>>>>>>>>>> Short story:
>>>>>>>>>> 
>>>>>>>>>> We want to rewrite the `integration.rst` file so
that it does not
>>>>>>> contain
>>>>>>>>>> duplicates from `code.rst ' (API Reference). In the
next step,
>>>>>>> introduce
>>>>>>>>>> the reference API generation based on the source
code that will
>> replace
>>>>>>>>> the
>>>>>>>>>> `code.rst` file.
>>>>>>>>>> 
>>>>>>>>>> Long story:
>>>>>>>>>> 
>>>>>>>>>> Currently, the documentation contains two places
where the
>> description
>>>>>>> of
>>>>>>>>>> classes related to operators is included. They are
`code.rst` and
>>>>>>>>>> `integration.rst` files.
>>>>>>>>>> 
>>>>>>>>>> The `integration.rst` file contains information about
>> integration, in
>>>>>>>>>> particular for Azure: Microsoft Azure, AWS: Amazon
Web Services,
>>>>>>>>>> Databricks, GCP: Google Cloud Platform, Qubole. Other
>> integrations,
>>>>>>>>>> however, do not have descriptions.
>>>>>>>>>> 
>>>>>>>>>> The `code.rst` file contains “API Reference”
which contains
>> information
>>>>>>>>>> about *all* classes including those included in the
file
>>>>>>>>> `integration.rst`.
>>>>>>>>>> 
>>>>>>>>>> Such duplication, however, is problematic for several
reasons:
>>>>>>>>>> 
>>>>>>>>>> 1.
>>>>>>>>>> 
>>>>>>>>>> Users may feel lost and may not know which section
they should
>> look
>>>>>>>>> into.
>>>>>>>>>> 2.
>>>>>>>>>> 
>>>>>>>>>> Changes must be made in many places which leads to
>> desynchronization.
>>>>>>>>>> Most often, changes are made only in the source code,
so they do
>> not
>>>>>>>>> appear
>>>>>>>>>> in the generated documentation.
>>>>>>>>>> 3.
>>>>>>>>>> 
>>>>>>>>>> Linking to classes using the `class` directive for
Sphinx is
>>>>>>>>>> inconclusive - if the code is embedded both in `integration.rst`
>> and
>>>>>>>>>> `code.rst` using the `autoclass` directive, we’re
not sure where
>> the
>>>>>>>>> user
>>>>>>>>>> will be navigated.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> There are several solutions::
>>>>>>>>>> 
>>>>>>>>>> 1.
>>>>>>>>>> 
>>>>>>>>>> Leave it as is. Then we need to agree on which `autoclass`
>> directive
>>>>>>>>>> should have the `no-index` flags.
>>>>>>>>>> 2.
>>>>>>>>>> 
>>>>>>>>>> Delete duplicates from the `code.rst` file and add
a note about
>> the
>>>>>>>>>> `integration.rst` file in the `code.rst` file.
>>>>>>>>>> 3.
>>>>>>>>>> 
>>>>>>>>>> Delete duplicates from the `integration.rst` file
and add a note
>> about
>>>>>>>>>> the `code.rst` file in the `integration.rst` file.
>>>>>>>>>> 4.
>>>>>>>>>> 
>>>>>>>>>> Delete information from both files and generate the
API
>> documentation
>>>>>>>>>> always based only on the source code. This solution
means that we
>>>>>>> would
>>>>>>>>>> have to write less documentation.
>>>>>>>>>> There are ready tools that we can use:
>>>>>>>>>> 1.
>>>>>>>>>> 
>>>>>>>>>>  epydoc - http://epydoc.sourceforge.net/ ;
>>>>>>>>>>  2.
>>>>>>>>>> 
>>>>>>>>>>  autoapi extension for Sphinx -
>>>>>>>>> https://github.com/rtfd/sphinx-autoapi
>>>>>>>>>>  ;
>>>>>>>>>>  3.
>>>>>>>>>> 
>>>>>>>>>>  other - https://wiki.python.org/moin/DocumentationTools
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The first, second, third solution does not solve
all problems. In
>>>>>>>>>> particular, we still need to complete the `code.rst`
and
>>>>>>>>> `integration.rst`
>>>>>>>>>> files. The fourth solution solves all problems, but
is the most
>>>>>>> complex.
>>>>>>>>> It
>>>>>>>>>> is worth noting that mixing solutions is possible.
For example,
>> we can
>>>>>>>>>> delete information from the file `integration.rst`
as short term
>>>>>>> solution
>>>>>>>>>> and start working on creating another form of documentation
as a
>> long
>>>>>>>>> term
>>>>>>>>>> solution. This is the best option in our opinion.
>>>>>>>>>> 
>>>>>>>>>> I’ve recently done a few activities related to
this topic.
>>>>>>>>>> 
>>>>>>>>>> First, I added the noindex flag to autoclass directives
for all
>>>>>>> operators
>>>>>>>>>> in `integration.rst` file. In rare cases (If any),
this caused
>> links
>>>>>>> that
>>>>>>>>>> were previously directed to the file `integration.rst`
to be
>> redirected
>>>>>>>>> to
>>>>>>>>>> the `code.rst` file. Elements had to be linked using
`:class:`
>> instead
>>>>>>>>> of a
>>>>>>>>>> section link. Each operator is included in the new
section in
>> this
>>>>>>> file.
>>>>>>>>>> 
>>>>>>>>>> PR: https://github.com/apache/airflow/pull/4585
>>>>>>>>>> <https://github.com/apache/airflow/pull/4585/files>
>>>>>>>>>> 
>>>>>>>>>> Second, I completed the `code.rst` file with the
missing classes.
>>>>>>>>>> 
>>>>>>>>>> PR: https://github.com/apache/airflow/pull/4644
>>>>>>>>>> 
>>>>>>>>>> I would like to ask which solution is the best in
your opinion?
>> What
>>>>>>>>> steps
>>>>>>>>>> should we take to make the documentation more enjoyable?
>>>>>>>>>> 
>>>>>>>>>> Greetings
>>>>>>>>>> 
>>>>>>>>>> Kamil Breguła
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Kamil Breguła
>>>>>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>>>> 
>>>>>>>> M: +48 505 458 451 <+48505458451>
>>>>>>>> E: kamil.bregula@polidea.com
>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>> 
>>>>>>>> We create human & business stories through technology.
>>>>>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>>>> [image: Github] <https://github.com/Polidea> [image:
Facebook]
>>>>>>>> <https://www.facebook.com/Polidea.Software> [image:
Twitter]
>>>>>>>> <https://twitter.com/polidea> [image: Linkedin]
>>>>>>>> <https://www.linkedin.com/company/polidea> [image:
Instagram]
>>>>>>>> <https://instagram.com/polidea> [image: Behance]
>>>>>>>> <https://www.behance.net/polidea>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Kamil Breguła
>>>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>>> 
>>>>>> M: +48 505 458 451 <+48505458451>
>>>>>> E: kamil.bregula@polidea.com
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>> 
>>>>>> We create human & business stories through technology.
>>>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>>> [image: Github] <https://github.com/Polidea> [image: Facebook]
>>>>>> <https://www.facebook.com/Polidea.Software> [image: Twitter]
>>>>>> <https://twitter.com/polidea> [image: Linkedin]
>>>>>> <https://www.linkedin.com/company/polidea> [image: Instagram]
>>>>>> <https://instagram.com/polidea> [image: Behance]
>>>>>> <https://www.behance.net/polidea>
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Kamil Breguła
>>> Polidea | Software Engineer
>>> 
>>> M: +48 505 458 451
>>> E: kamil.bregula@polidea.com
>>> 
>>> We create human & business stories through technology.
>>> Check out our projects!
>> 
>> 
>> 
>> --
>> 
>> Kamil Breguła
>> Polidea | Software Engineer
>> 
>> M: +48 505 458 451
>> E: kamil.bregula@polidea.com
>> 
>> We create human & business stories through technology.
>> Check out our projects!
>> 


Mime
View raw message