spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: [DISCUSS] Increasing minimum supported version of Pandas
Date Fri, 14 Jun 2019 18:06:15 GMT
Are there other Python dependencies we should consider upgrading at the
same time?

On Fri, Jun 14, 2019 at 7:45 PM Felix Cheung <felixcheung_m@hotmail.com>
wrote:

> So to be clear, min version check is 0.23
> Jenkins test is 0.24
>
> I’m ok with this. I hope someone will test 0.23 on releases though before
> we sign off?
>
We should maybe add this to the release instruction notes?

>
> ------------------------------
> *From:* shane knapp <sknapp@berkeley.edu>
> *Sent:* Friday, June 14, 2019 10:23:56 AM
> *To:* Bryan Cutler
> *Cc:* Dongjoon Hyun; Holden Karau; Hyukjin Kwon; dev
> *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas
>
> excellent.  i shall not touch anything.  :)
>
> On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler <cutlerb@gmail.com> wrote:
>
>> Shane, I think 0.24.2 is probably more common right now, so if we were to
>> pick one to test against, I still think it should be that one. Our Pandas
>> usage in PySpark is pretty conservative, so it's pretty unlikely that we
>> will add something that would break 0.23.X.
>>
>> On Fri, Jun 14, 2019 at 10:10 AM shane knapp <sknapp@berkeley.edu> wrote:
>>
>>> ah, ok...  should we downgrade the testing env on jenkins then?  any
>>> specific version?
>>>
>>> shane, who is loathe (and i mean LOATHE) to touch python envs ;)
>>>
>>> On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler <cutlerb@gmail.com> wrote:
>>>
>>>> I should have stated this earlier, but when the user does something
>>>> that requires Pandas, the minimum version is checked against what was
>>>> imported and will raise an exception if it is a lower version. So I'm
>>>> concerned that using 0.24.2 might be a little too new for users running
>>>> older clusters. To give some release dates, 0.23.2 was released about a
>>>> year ago, 0.24.0 in January and 0.24.2 in March.
>>>>
>>> I think given that we’re switching to requiring Python 3 and also a bit
of a way from cutting a release 0.24 could be Ok as a min version
requirement

>
>>>>
>>>> On Fri, Jun 14, 2019 at 9:27 AM shane knapp <sknapp@berkeley.edu>
>>>> wrote:
>>>>
>>>>> just to everyone knows, our python 3.6 testing infra is currently on
>>>>> 0.24.2...
>>>>>
>>>>> On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun <dongjoon.hyun@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Thank you for this effort, Bryan!
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau <holden@pigscanfly.ca>
>>>>>> wrote:
>>>>>>
>>>>>>> I’m +1 for upgrading, although since this is probably the last
easy
>>>>>>> chance we’ll have to bump version numbers easily I’d suggest
0.24.2
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon <gurwls223@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am +1 to go for 0.23.2 - it brings some overhead to test
PyArrow
>>>>>>>> and pandas combinations. Spark 3 should be good time to increase.
>>>>>>>>
>>>>>>>> 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler <cutlerb@gmail.com>님이
작성:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> We would like to discuss increasing the minimum supported
version
>>>>>>>>> of Pandas in Spark, which is currently 0.19.2.
>>>>>>>>>
>>>>>>>>> Pandas 0.19.2 was released nearly 3 years ago and there
are some
>>>>>>>>> workarounds in PySpark that could be removed if such
an old version is not
>>>>>>>>> required. This will help to keep code clean and reduce
maintenance effort.
>>>>>>>>>
>>>>>>>>> The change is targeted for Spark 3.0.0 release, see
>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28041. The
current
>>>>>>>>> thought is to bump the version to 0.23.2, but we would
like to discuss
>>>>>>>>> before making a change. Does anyone else have thoughts
on this?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Bryan
>>>>>>>>>
>>>>>>>> --
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Shane Knapp
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Mime
View raw message