spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: [DISCUSS] Increasing minimum supported version of Pandas
Date Sat, 15 Jun 2019 23:37:09 GMT
Oh btw, why is it 0.23.2, not 0.23.0 or 0.23.4?

On Sat, 15 Jun 2019, 06:56 Bryan Cutler, <cutlerb@gmail.com> wrote:

> Yeah, PyArrow is the only other PySpark dependency we check for a minimum
> version. We updated that not too long ago to be 0.12.1, which I think we
> are still good on for now.
>
> On Fri, Jun 14, 2019 at 11:36 AM Felix Cheung <felixcheung_m@hotmail.com>
> wrote:
>
>> How about pyArrow?
>>
>> ------------------------------
>> *From:* Holden Karau <holden@pigscanfly.ca>
>> *Sent:* Friday, June 14, 2019 11:06:15 AM
>> *To:* Felix Cheung
>> *Cc:* Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp
>> *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas
>>
>> Are there other Python dependencies we should consider upgrading at the
>> same time?
>>
>> On Fri, Jun 14, 2019 at 7:45 PM Felix Cheung <felixcheung_m@hotmail.com>
>> wrote:
>>
>>> So to be clear, min version check is 0.23
>>> Jenkins test is 0.24
>>>
>>> I’m ok with this. I hope someone will test 0.23 on releases though
>>> before we sign off?
>>>
>> We should maybe add this to the release instruction notes?
>>
>>>
>>> ------------------------------
>>> *From:* shane knapp <sknapp@berkeley.edu>
>>> *Sent:* Friday, June 14, 2019 10:23:56 AM
>>> *To:* Bryan Cutler
>>> *Cc:* Dongjoon Hyun; Holden Karau; Hyukjin Kwon; dev
>>> *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas
>>>
>>> excellent.  i shall not touch anything.  :)
>>>
>>> On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler <cutlerb@gmail.com> wrote:
>>>
>>>> Shane, I think 0.24.2 is probably more common right now, so if we were
>>>> to pick one to test against, I still think it should be that one. Our
>>>> Pandas usage in PySpark is pretty conservative, so it's pretty unlikely
>>>> that we will add something that would break 0.23.X.
>>>>
>>>> On Fri, Jun 14, 2019 at 10:10 AM shane knapp <sknapp@berkeley.edu>
>>>> wrote:
>>>>
>>>>> ah, ok...  should we downgrade the testing env on jenkins then?  any
>>>>> specific version?
>>>>>
>>>>> shane, who is loathe (and i mean LOATHE) to touch python envs ;)
>>>>>
>>>>> On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler <cutlerb@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I should have stated this earlier, but when the user does something
>>>>>> that requires Pandas, the minimum version is checked against what
was
>>>>>> imported and will raise an exception if it is a lower version. So
I'm
>>>>>> concerned that using 0.24.2 might be a little too new for users running
>>>>>> older clusters. To give some release dates, 0.23.2 was released about
a
>>>>>> year ago, 0.24.0 in January and 0.24.2 in March.
>>>>>>
>>>>> I think given that we’re switching to requiring Python 3 and also a
>> bit of a way from cutting a release 0.24 could be Ok as a min version
>> requirement
>>
>>>
>>>>>>
>>>>>> On Fri, Jun 14, 2019 at 9:27 AM shane knapp <sknapp@berkeley.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> just to everyone knows, our python 3.6 testing infra is currently
on
>>>>>>> 0.24.2...
>>>>>>>
>>>>>>> On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun <
>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Thank you for this effort, Bryan!
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau <holden@pigscanfly.ca>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I’m +1 for upgrading, although since this is probably
the last
>>>>>>>>> easy chance we’ll have to bump version numbers easily
I’d suggest 0.24.2
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon <gurwls223@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am +1 to go for 0.23.2 - it brings some overhead
to test
>>>>>>>>>> PyArrow and pandas combinations. Spark 3 should be
good time to increase.
>>>>>>>>>>
>>>>>>>>>> 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler
<cutlerb@gmail.com>님이 작성:
>>>>>>>>>>
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> We would like to discuss increasing the minimum
supported
>>>>>>>>>>> version of Pandas in Spark, which is currently
0.19.2.
>>>>>>>>>>>
>>>>>>>>>>> Pandas 0.19.2 was released nearly 3 years ago
and there are some
>>>>>>>>>>> workarounds in PySpark that could be removed
if such an old version is not
>>>>>>>>>>> required. This will help to keep code clean and
reduce maintenance effort.
>>>>>>>>>>>
>>>>>>>>>>> The change is targeted for Spark 3.0.0 release,
see
>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28041.
The current
>>>>>>>>>>> thought is to bump the version to 0.23.2, but
we would like to discuss
>>>>>>>>>>> before making a change. Does anyone else have
thoughts on this?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Bryan
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Shane Knapp
>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Shane Knapp
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Mime
View raw message