spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Re: Apache Spark 3.1 Preparation Status (Oct. 2020)
Date Sun, 04 Oct 2020 17:53:01 GMT
Thank you all.

BTW, Xiao and Mridul, I'm wondering what date you have in your mind
specifically.

Usually, `Christmas and New Year season` doesn't give us much additional
time.

If you think so, could you make a PR for Apache Spark website according to
your expectation?

https://spark.apache.org/versioning-policy.html

Bests,
Dongjoon.


On Sun, Oct 4, 2020 at 7:18 AM Mridul Muralidharan <mridul@gmail.com> wrote:

>
> +1 on pushing the branch cut for increased dev time to match previous
> releases.
>
> Regards,
> Mridul
>
> On Sat, Oct 3, 2020 at 10:22 PM Xiao Li <gatorsmile@gmail.com> wrote:
>
>> Thank you for your updates.
>>
>> Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of
>> the 3.1 branch cut, the feature development time window is less than 5
>> months. This is shorter than what we did in Spark 2.3 and 2.4 releases.
>>
>> Below are three highly desirable feature work I am watching. Hopefully,
>> we can finish them before the branch cut.
>>
>>    - Support push-based shuffle to improve shuffle efficiency:
>>    https://issues.apache.org/jira/browse/SPARK-30602
>>    - Unify create table syntax:
>>    https://issues.apache.org/jira/browse/SPARK-31257
>>    - Bloom filter join: https://issues.apache.org/jira/browse/SPARK-32268
>>
>> Thanks,
>>
>> Xiao
>>
>>
>> Hyukjin Kwon <gurwls223@gmail.com> 于2020年10月3日周六 下午5:41写道:
>>
>>> Nice summary. Thanks Dongjoon. One minor correction -> I believe we
>>> dropped R 3.5 and below at branch 2.4 as well.
>>>
>>> On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.hyun@gmail.com>
>>> wrote:
>>>
>>>> Hi, All.
>>>>
>>>> As of today, master branch (Apache Spark 3.1.0) resolved
>>>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches.
>>>> According to the 3.1.0 release window, branch-3.1 will be
>>>> created on November 1st and enters QA period.
>>>>
>>>> Here are some notable updates I've been monitoring.
>>>>
>>>> *Language*
>>>> 01. SPARK-25075 Support Scala 2.13
>>>>       - Since SPARK-32926, Scala 2.13 build test has
>>>>         become a part of GitHub Action jobs.
>>>>       - After SPARK-33044, Scala 2.13 test will be
>>>>         a part of Jenkins jobs.
>>>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5
>>>> 03. SPARK-32082 Project Zen: Improving Python usability
>>>>       - 7 of 16 issues are resolved.
>>>> 04. SPARK-32073 Drop R < 3.5 support
>>>>       - This is done for Spark 3.0.1 and 3.1.0.
>>>>
>>>> *Dependency*
>>>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency
>>>>       - This changes the default dist. for better cloud support
>>>> 06. SPARK-32981 Remove hive-1.2 distribution
>>>> 07. SPARK-20202 Remove references to org.spark-project.hive
>>>>       - This will remove Hive 1.2.1 from source code
>>>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP)
>>>>
>>>> *Core*
>>>> 09. SPARK-27495 Support Stage level resource conf and scheduling
>>>>       - 11 of 15 issues are resolved
>>>> 10. SPARK-25299 Use remote storage for persisting shuffle data
>>>>       - 8 of 14 issues are resolved
>>>>
>>>> *Resource Manager*
>>>> 11. SPARK-33005 Kubernetes GA preparation
>>>>       - It is on the way and we are waiting for more feedback.
>>>>
>>>> *SQL*
>>>> 12. SPARK-30648/SPARK-32346 Support filters pushdown
>>>>       to JSON/Avro
>>>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer
>>>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab
>>>>       - 11 of 17 issues are resolved
>>>> 15. SPARK-27589 DSv2 was mostly completed in 3.0
>>>>       and added more features in 3.1 but still we missed
>>>>       - All built-in DataSource v2 write paths are disabled
>>>>         and v1 write is used instead.
>>>>       - Support partition pruning with subqueries
>>>>       - Support bucketing
>>>>
>>>> We still have one month before the feature freeze
>>>> and starting QA. If you are working for 3.1,
>>>> please consider the timeline and share your schedule
>>>> with the Apache Spark community. For the other stuff,
>>>> we can put it into 3.2 release scheduled in June 2021.
>>>>
>>>> Last not but least, I want to emphasize (7) once again.
>>>> We need to remove the forked unofficial Hive eventually.
>>>> Please let us know your reasons if you need to build
>>>> from Apache Spark 3.1 source code for Hive 1.2.
>>>>
>>>> https://github.com/apache/spark/pull/29936
>>>>
>>>> As I wrote in the above PR description, for old releases,
>>>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide
>>>> Hive 1.2-based distribution.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>

Mime
View raw message