spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Dvorzhak <...@google.com.INVALID>
Subject Re: Apache Spark 3.1 Preparation Status (Oct. 2020)
Date Mon, 05 Oct 2020 05:35:18 GMT
Why to move the code freeze to early December? Seems like even according to
the changed release cadence the code freeze should happen in mid-November.

On Sun, Oct 4, 2020 at 6:26 PM Xiao Li <gatorsmile@gmail.com> wrote:

> Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0.
>
>
> I think we made a change in release cadence since Spark 2.3. See the
> commit:
> https://github.com/apache/spark-website/commit/88990968962e5cc47db8bc2c11a50742d2438daa
> Thus, Spark 3.1 might just follow the release cadence of Spark 2.3/2.4, if
> we do not want to change the release cadence?
>
> How about moving the code freeze of Spark 3.1 to *Early Dec 2020* and the
> RC1 date to* Early Jan 2021*?
>
> Thanks,
>
> Xiao
>
>
> Dongjoon Hyun <dongjoon.hyun@gmail.com> 于2020年10月4日周日 下午12:44写道:
>
>> For Xiao's comment, I want to point out that Apache Spark 3.1.0 is
>> different from 2.3 or 2.4.
>>
>> Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0.
>>
>> - Apache Spark 2.0.0 was released on July 26, 2016.
>> - Apache Spark 2.1.0 was released on December 28, 2016.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Sun, Oct 4, 2020 at 10:53 AM Dongjoon Hyun <dongjoon.hyun@gmail.com>
>> wrote:
>>
>>> Thank you all.
>>>
>>> BTW, Xiao and Mridul, I'm wondering what date you have in your mind
>>> specifically.
>>>
>>> Usually, `Christmas and New Year season` doesn't give us much additional
>>> time.
>>>
>>> If you think so, could you make a PR for Apache Spark website according
>>> to your expectation?
>>>
>>> https://spark.apache.org/versioning-policy.html
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Sun, Oct 4, 2020 at 7:18 AM Mridul Muralidharan <mridul@gmail.com>
>>> wrote:
>>>
>>>>
>>>> +1 on pushing the branch cut for increased dev time to match previous
>>>> releases.
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>> On Sat, Oct 3, 2020 at 10:22 PM Xiao Li <gatorsmile@gmail.com> wrote:
>>>>
>>>>> Thank you for your updates.
>>>>>
>>>>> Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date
>>>>> of the 3.1 branch cut, the feature development time window is less than
5
>>>>> months. This is shorter than what we did in Spark 2.3 and 2.4 releases.
>>>>>
>>>>> Below are three highly desirable feature work I am watching.
>>>>> Hopefully, we can finish them before the branch cut.
>>>>>
>>>>>    - Support push-based shuffle to improve shuffle efficiency:
>>>>>    https://issues.apache.org/jira/browse/SPARK-30602
>>>>>    - Unify create table syntax:
>>>>>    https://issues.apache.org/jira/browse/SPARK-31257
>>>>>    - Bloom filter join:
>>>>>    https://issues.apache.org/jira/browse/SPARK-32268
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>> Hyukjin Kwon <gurwls223@gmail.com> 于2020年10月3日周六 下午5:41写道:
>>>>>
>>>>>> Nice summary. Thanks Dongjoon. One minor correction -> I believe
we
>>>>>> dropped R 3.5 and below at branch 2.4 as well.
>>>>>>
>>>>>> On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.hyun@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, All.
>>>>>>>
>>>>>>> As of today, master branch (Apache Spark 3.1.0) resolved
>>>>>>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches.
>>>>>>> According to the 3.1.0 release window, branch-3.1 will be
>>>>>>> created on November 1st and enters QA period.
>>>>>>>
>>>>>>> Here are some notable updates I've been monitoring.
>>>>>>>
>>>>>>> *Language*
>>>>>>> 01. SPARK-25075 Support Scala 2.13
>>>>>>>       - Since SPARK-32926, Scala 2.13 build test has
>>>>>>>         become a part of GitHub Action jobs.
>>>>>>>       - After SPARK-33044, Scala 2.13 test will be
>>>>>>>         a part of Jenkins jobs.
>>>>>>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5
>>>>>>> 03. SPARK-32082 Project Zen: Improving Python usability
>>>>>>>       - 7 of 16 issues are resolved.
>>>>>>> 04. SPARK-32073 Drop R < 3.5 support
>>>>>>>       - This is done for Spark 3.0.1 and 3.1.0.
>>>>>>>
>>>>>>> *Dependency*
>>>>>>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency
>>>>>>>       - This changes the default dist. for better cloud support
>>>>>>> 06. SPARK-32981 Remove hive-1.2 distribution
>>>>>>> 07. SPARK-20202 Remove references to org.spark-project.hive
>>>>>>>       - This will remove Hive 1.2.1 from source code
>>>>>>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP)
>>>>>>>
>>>>>>> *Core*
>>>>>>> 09. SPARK-27495 Support Stage level resource conf and scheduling
>>>>>>>       - 11 of 15 issues are resolved
>>>>>>> 10. SPARK-25299 Use remote storage for persisting shuffle data
>>>>>>>       - 8 of 14 issues are resolved
>>>>>>>
>>>>>>> *Resource Manager*
>>>>>>> 11. SPARK-33005 Kubernetes GA preparation
>>>>>>>       - It is on the way and we are waiting for more feedback.
>>>>>>>
>>>>>>> *SQL*
>>>>>>> 12. SPARK-30648/SPARK-32346 Support filters pushdown
>>>>>>>       to JSON/Avro
>>>>>>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer
>>>>>>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab
>>>>>>>       - 11 of 17 issues are resolved
>>>>>>> 15. SPARK-27589 DSv2 was mostly completed in 3.0
>>>>>>>       and added more features in 3.1 but still we missed
>>>>>>>       - All built-in DataSource v2 write paths are disabled
>>>>>>>         and v1 write is used instead.
>>>>>>>       - Support partition pruning with subqueries
>>>>>>>       - Support bucketing
>>>>>>>
>>>>>>> We still have one month before the feature freeze
>>>>>>> and starting QA. If you are working for 3.1,
>>>>>>> please consider the timeline and share your schedule
>>>>>>> with the Apache Spark community. For the other stuff,
>>>>>>> we can put it into 3.2 release scheduled in June 2021.
>>>>>>>
>>>>>>> Last not but least, I want to emphasize (7) once again.
>>>>>>> We need to remove the forked unofficial Hive eventually.
>>>>>>> Please let us know your reasons if you need to build
>>>>>>> from Apache Spark 3.1 source code for Hive 1.2.
>>>>>>>
>>>>>>> https://github.com/apache/spark/pull/29936
>>>>>>>
>>>>>>> As I wrote in the above PR description, for old releases,
>>>>>>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide
>>>>>>> Hive 1.2-based distribution.
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>

Mime
View raw message