spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Should python-2 be supported in Spark 3.0?
Date Mon, 17 Sep 2018 23:26:17 GMT
i'd like to second that.

if we want to communicate timeline, we can add to the release notes saying
py2 will be deprecated in 3.0, and removed in a 3.x release.

--
excuse the brevity and lower case due to wrist injury


On Mon, Sep 17, 2018 at 4:24 PM Matei Zaharia <matei.zaharia@gmail.com>
wrote:

> That’s a good point — I’d say there’s just a risk of creating a perception
> issue. First, some users might feel that this means they have to migrate
> now, which is before Python itself drops support; they might also be
> surprised that we did this in a minor release (e.g. might we drop Python 2
> altogether in a Spark 2.5 if that later comes out?). Second, contributors
> might feel that this means new features no longer have to work with Python
> 2, which would be confusing. Maybe it’s OK on both fronts, but it just
> seems scarier for users to do this now if we do plan to have Spark 3.0 in
> the next 6 months anyway.
>
> Matei
>
> > On Sep 17, 2018, at 1:04 PM, Mark Hamstra <mark@clearstorydata.com>
> wrote:
> >
> > What is the disadvantage to deprecating now in 2.4.0? I mean, it doesn't
> change the code at all; it's just a notification that we will eventually
> cease supporting Py2. Wouldn't users prefer to get that notification sooner
> rather than later?
> >
> > On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia <matei.zaharia@gmail.com>
> wrote:
> > I’d like to understand the maintenance burden of Python 2 before
> deprecating it. Since it is not EOL yet, it might make sense to only
> deprecate it once it’s EOL (which is still over a year from now).
> Supporting Python 2+3 seems less burdensome than supporting, say, multiple
> Scala versions in the same codebase, so what are we losing out?
> >
> > The other thing is that even though Python core devs might not support
> 2.x later, it’s quite possible that various Linux distros will if moving
> from 2 to 3 remains painful. In that case, we may want Apache Spark to
> continue releasing for it despite the Python core devs not supporting it.
> >
> > Basically, I’d suggest to deprecate this in Spark 3.0 and then remove it
> later in 3.x instead of deprecating it in 2.4. I’d also consider looking at
> what other data science tools are doing before fully removing it: for
> example, if Pandas and TensorFlow no longer support Python 2 past some
> point, that might be a good point to remove it.
> >
> > Matei
> >
> > > On Sep 17, 2018, at 11:01 AM, Mark Hamstra <mark@clearstorydata.com>
> wrote:
> > >
> > > If we're going to do that, then we need to do it right now, since
> 2.4.0 is already in release candidates.
> > >
> > > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson <eerlands@redhat.com>
> wrote:
> > > I like Mark’s concept for deprecating Py2 starting with 2.4: It may
> seem like a ways off but even now there may be some spark versions
> supporting Py2 past the point where Py2 is no longer receiving security
> patches
> > >
> > >
> > > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra <mark@clearstorydata.com>
> wrote:
> > > We could also deprecate Py2 already in the 2.4.0 release.
> > >
> > > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson <eerlands@redhat.com>
> wrote:
> > > In case this didn't make it onto this thread:
> > >
> > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
> remove it entirely on a later 3.x release.
> > >
> > > On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson <eerlands@redhat.com>
> wrote:
> > > On a separate dev@spark thread, I raised a question of whether or not
> to support python 2 in Apache Spark, going forward into Spark 3.0.
> > >
> > > Python-2 is going EOL at the end of 2019. The upcoming release of
> Spark 3.0 is an opportunity to make breaking changes to Spark's APIs, and
> so it is a good time to consider support for Python-2 on PySpark.
> > >
> > > Key advantages to dropping Python 2 are:
> > >       • Support for PySpark becomes significantly easier.
> > >       • Avoid having to support Python 2 until Spark 4.0, which is
> likely to imply supporting Python 2 for some time after it goes EOL.
> > > (Note that supporting python 2 after EOL means, among other things,
> that PySpark would be supporting a version of python that was no longer
> receiving security patches)
> > >
> > > The main disadvantage is that PySpark users who have legacy python-2
> code would have to migrate their code to python 3 to take advantage of
> Spark 3.0
> > >
> > > This decision obviously has large implications for the Apache Spark
> community and we want to solicit community feedback.
> > >
> > >
> >
>
>

Mime
View raw message