spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Szymkiewicz <mszymkiew...@gmail.com>
Subject Re: [PYTHON] PySpark typing hints
Date Tue, 23 May 2017 13:31:31 GMT
It doesn't break anything at all. You can take stub files as-is, put
these into PySpark root, and as long as users are not interested in type
checking, it won't have any runtime impact.

Surprisingly the current MyPy build (mypy==0.511) reports only one
incompatibility with Python 2 (dynamic metaclasses), which is could be
resolved without significant loss of function.

On 05/23/2017 12:08 PM, Reynold Xin wrote:
> Seems useful to do. Is there a way to do this so it doesn't break
> Python 2.x?
>
>
> On Sun, May 14, 2017 at 11:44 PM, Maciej Szymkiewicz
> <mszymkiewicz@gmail.com <mailto:mszymkiewicz@gmail.com>> wrote:
>
>     Hi everyone,
>
>     For the last few months I've been working on static type
>     annotations for PySpark. For those of you, who are not familiar
>     with the idea, typing hints have been introduced by PEP 484
>     (https://www.python.org/dev/peps/pep-0484/
>     <https://www.python.org/dev/peps/pep-0484/>) and further extended
>     with PEP 526 (https://www.python.org/dev/peps/pep-0526/
>     <https://www.python.org/dev/peps/pep-0526/>) with the main goal of
>     providing information required for static analysis. Right now
>     there a few tools which support typing hints, including Mypy
>     (https://github.com/python/mypy <https://github.com/python/mypy>)
>     and PyCharm
>     (https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html
>     <https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html>).

>     Type hints can be added using function annotations
>     (https://www.python.org/dev/peps/pep-3107/
>     <https://www.python.org/dev/peps/pep-3107/>, Python 3 only),
>     docstrings, or source independent stub files
>     (https://www.python.org/dev/peps/pep-0484/#stub-files
>     <https://www.python.org/dev/peps/pep-0484/#stub-files>). Typing is
>     optional, gradual and has no runtime impact.
>
>     At this moment I've annotated majority of the API, including
>     majority of pyspark.sql and pyspark.ml <http://pyspark.ml>. At
>     this moment project is still rough around the edges, and may
>     result in both false positive and false negatives, but I think it
>     become mature enough to be useful in practice.
>
>     The current version is compatible only with Python 3, but it is
>     possible, with some limitations, to backport it to Python 2
>     (though it is not on my todo list).
>
>     There is a number of possible benefits for PySpark users and
>     developers:
>
>       * Static analysis can detect a number of common mistakes to
>         prevent runtime failures. Generic self is still fairly
>         limited, so it is more useful with DataFrames, SS and ML than
>         RDD, DStreams or RDD.
>       * Annotations can be used for documenting complex signatures
>         (https://git.io/v95JN) including dependencies on arguments and
>         value (https://git.io/v95JA).
>       * Detecting possible bugs in Spark (SPARK-20631) .
>       * Showing API inconsistencies.
>
>     Roadmap
>
>       * Update the project to reflect Spark 2.2.
>       * Refine existing annotations.
>
>     If there will be enough interest I am happy to contribute this
>     back to Spark or submit to Typeshed
>     (https://github.com/python/typeshed
>     <https://github.com/python/typeshed> -  this would require a
>     formal ASF approval, and since Typeshed doesn't provide
>     versioning, is probably not the best option in our case).
>
>     Further inforamtion:
>
>       * https://github.com/zero323/pyspark-stubs
>         <https://github.com/zero323/pyspark-stubs> - GitHub repository
>
>       * https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017
>         <https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017>
>         - interesting presentation by Marco Bonzanini
>
>     -- 
>     Best,
>     Maciej
>
>

-- 
Maciej Szymkiewicz


Mime
View raw message