spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: [DISCUSS] Drop Python 2, 3.4 and 3.5
Date Thu, 02 Jul 2020 02:13:12 GMT
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <gurwls223@gmail.com> wrote:

> Hi all,
>
> I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5
> at https://github.com/apache/spark/pull/28957. I assume people support it
> in general
> but I am writing this to make sure everybody is happy.
>
> Fokko made a very good investigation on it, see
> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
> Assuming from the statistics, I think we're pretty safe to drop them.
> Also note that dropping Python 2 was actually declared at
> https://python3statement.org/
>
> Roughly speaking, there are many main advantages by dropping them:
>   1. It removes a bunch of hacks we added around 700 lines in PySpark.
>   2. PyPy2 has a critical bug that causes a flaky test,
> https://issues.apache.org/jira/browse/SPARK-28358 given my testing and
> investigation.
>   3. Users can use Python type hints with Pandas UDFs without thinking
> about Python version
>   4. Users can leverage one latest cloudpickle,
> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also
> leverage C pickle.
>   5. ...
>
> So it benefits both users and dev. WDYT guys?
>
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Mime
View raw message