spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shane knapp <skn...@berkeley.edu>
Subject Re: [DISCUSS] Increasing minimum supported version of Pandas
Date Fri, 14 Jun 2019 17:23:56 GMT
excellent.  i shall not touch anything.  :)

On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler <cutlerb@gmail.com> wrote:

> Shane, I think 0.24.2 is probably more common right now, so if we were to
> pick one to test against, I still think it should be that one. Our Pandas
> usage in PySpark is pretty conservative, so it's pretty unlikely that we
> will add something that would break 0.23.X.
>
> On Fri, Jun 14, 2019 at 10:10 AM shane knapp <sknapp@berkeley.edu> wrote:
>
>> ah, ok...  should we downgrade the testing env on jenkins then?  any
>> specific version?
>>
>> shane, who is loathe (and i mean LOATHE) to touch python envs ;)
>>
>> On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler <cutlerb@gmail.com> wrote:
>>
>>> I should have stated this earlier, but when the user does something that
>>> requires Pandas, the minimum version is checked against what was imported
>>> and will raise an exception if it is a lower version. So I'm concerned that
>>> using 0.24.2 might be a little too new for users running older clusters. To
>>> give some release dates, 0.23.2 was released about a year ago, 0.24.0 in
>>> January and 0.24.2 in March.
>>>
>>> On Fri, Jun 14, 2019 at 9:27 AM shane knapp <sknapp@berkeley.edu> wrote:
>>>
>>>> just to everyone knows, our python 3.6 testing infra is currently on
>>>> 0.24.2...
>>>>
>>>> On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun <dongjoon.hyun@gmail.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Thank you for this effort, Bryan!
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau <holden@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> I’m +1 for upgrading, although since this is probably the last
easy
>>>>>> chance we’ll have to bump version numbers easily I’d suggest
0.24.2
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon <gurwls223@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow
>>>>>>> and pandas combinations. Spark 3 should be good time to increase.
>>>>>>>
>>>>>>> 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler <cutlerb@gmail.com>님이
작성:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We would like to discuss increasing the minimum supported
version
>>>>>>>> of Pandas in Spark, which is currently 0.19.2.
>>>>>>>>
>>>>>>>> Pandas 0.19.2 was released nearly 3 years ago and there are
some
>>>>>>>> workarounds in PySpark that could be removed if such an old
version is not
>>>>>>>> required. This will help to keep code clean and reduce maintenance
effort.
>>>>>>>>
>>>>>>>> The change is targeted for Spark 3.0.0 release, see
>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28041. The current
>>>>>>>> thought is to bump the version to 0.23.2, but we would like
to discuss
>>>>>>>> before making a change. Does anyone else have thoughts on
this?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Bryan
>>>>>>>>
>>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Shane Knapp
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Mime
View raw message